Friday, March 2, 2012

Defeating Packers for Static Analysis of Malicious Code

I doubt whether there is anybody in either AV industry or among reverse engineers who does not know what a software packer is (for those who don't - this article may help). Malware research and reverse engineering forums are full of packers' related questions, descriptions thereof, unpacking suggestions and links to both packers and unpackers. In short - people have been doing a lot of precious work on defeating packers and protectors.

However, for those of us who are not afraid of static analysis, there is an easier way (I'd dare to say "generic") to handle packers and protectors and retrieve the unpacked form of the executable (cannot hold my self from adding a note for nerds: no, this does not include reversing virtual machines like Oreans' one. This is up to you). So, the main problem is obtaining the unpacked version of code as all the rest may be well handled from there. What we actually need is a dump of unpacked executable. There are lots of memory dumping programs, but some protectors "know" how to handle them, therefore, this article explains a simple and short way of obtaining such dump without teasing the implemented protections.

Knock Knock
First of all, we need to, let's say, get into the process. There are at least two ways to do that:

  • Use the OpenProcess Windows API with, preferably, PROCESS_ALL_ACCESS and read/write from/to process' memory.
  • Inject our code into the process' memory space (simply a code injection or a DLL injection).
My preference is the second one as you mostly have more power operating from inside.

There are several ways to inject a DLL into another process, e.g. calling LoadLibraryA as a remote thread in the victim process or even this one (my preference is the second one again). This is in deed the easiest part. My personal suggestion would be to create a suspended process, inject your DLL and then resume the main thread of the created process as this provides you with greater flexibility.

Set the Trap
No, I am not referring to 0xCC (trap to debugger). Trap, in this case, means something that would trigger the dumper embedded into the injected DLL and cause it to dump the unpacked image, for example, patch one of the API functions with a jmp instruction, which would redirect the execution flow to where we want. Be careful with this approach, as your patch may be well "overpatched" by the protection mechanism of the target executable. Let me give you a couple of suggestions: never patch the first bytes of the API (I assume this code is not a production code, so you may let it be bound to your version of Windows); patch as deep as possible - meaning leave the kernel32.dll alone and go further to ntdll.dll where possible. 

For example, if your target executable outputs a string to a console, that may be a good idea to patch either WriteConsoleA or WriteConsoleW API  function. However, it may be an even better idea to go deeper and patch WriteConsoleInternal (Win7) and install your notification jump there. Once that API is called - chances are that the executable has been fully unpacked. As an alternative, you may simply create a new thread in your DLL and Sleep it for several milliseconds (or even seconds) and then dump the memory.

You may perform these actions in the DllMain of your injected DLL, on the other hand, you may create a separate procedure for this, but then you'd have to use this approach or something similar.

Dumper Triggered
No matter how (either by the API patch or our thread) the dumper is triggered. Sure thing - we are not going to dump the whole memory allocated by the process. We just need the image. The easiest way of getting the information on ImageBase and SizeOfImage of the target module (usually the main module of the process) is to find the corresponding entry in PEB (you may want to check the "Hiding Injected DLL in Windows" post to get more information on PEB and related structures). However, it is important to mention that you HAVE to be careful with that, as the information in PEB may be altered by the protection scheme of the victim executable. Having found the base address and the size of the image, just write the content of that memory region to a file (make sure to take note of image's base address if you are dumping DLL). Quite simple, isn't it? Well, not really. You have to check for memory protection of every region you are currently going to dump as it may have either PAGE_WRITE or PAGE_EXECUTE access rights only, meaning that you cannot access it for reading. Once done with this, you may either let the program execution to continue or terminate the process.

In addition, it is strongly recommended to suspend all the threads of the process, except the thread our code is running in.

Using Dump
Nothing's easier - load the dump into IDA Pro and see how good it handles it.

P.S. Suspending/Resuming Threads
Suspending threads is a bit annoying as you have to get the IDs of all the threads currently running in the system, then select those with process ID of your process and suspend then. The same procedure is applicable for resuming suspended threads.

First of all CreateToolhelp32Snapshot (MSDN):

HANDLE WINAPI CreateToolhelp32Snapshot(
              DWORD wdFlags,
              DWORD th32ProcessID

You have to specify TH32_SNAPTHREAD as flags in order to get threads information. If the return value is not NULL, the you may proceed with Thread32First:

BOOL WINAPI Thread32First(
            HANDLE          hSnapshot,
            LPTHREADENTRY32 lpte

followed by subsequent calls to Thread32Next (has the same arguments) until the return value is FALSE.

The functions Thread32First and Thread32Next fill the THREADENTRY32 structure which has the following format:

typedef struct tagTHREADENTRY32
   DWORD dwSize; /* Should be set to sizeof(THREADENTRY32) prior 
                    to calling Thread32First */
   DWORD cntUsage;
   DWORD th32ThreadID;
   DWORD th32OwnerProcessID;
   LONG  tpBasePri;
   LONG  tpDeltaPri;
   DWORD dwFlags;

Fields of interest for you would be the th32OwnerProcessID and the th32ThreadID. Compare the th32OwnerProcessID with the ID of the process (previously obtained with GetCurrentProcessId()) your code is running in. If those values are equal, then you have to open the thread with:

              DWORD dwDesiredAccess, /* Would be THREAD_ALL_ACCESS */
              BOOL  bInheritHandle, /* FALSE */
              DWORD dwThreadId      /* th32ThreadID */

Then suspend the thread with:

DWORD WINAPI SuspendThread( HANDLE hThread );

while passing the handle obtained with OpenThread().
You have to resume threads once you have saved the dump by calling:

DWORD WINAPI ResumeThread( HANDLE hThread );

Don't forget to close each thread handle with CloseHandle.

That's it. Hope this post was helpful (at least I used this method a lot).
See you at the next.


  1. Could you please example in the Visual Basic computing language. It is my understanding that the Visual Basic is the most advanced soft, so I am wondering why this artikal is talk about something not known well and not good.

    1. You are serious, aren't you?
      Although, it is possible to do all of the above in VB, but why would you want to "go from Paris to London via Tokyo"?

      "It is my understanding that the Visual Basic is the most advanced soft" - it depends on advanced in what area and for whom.

      "so I am wondering why this artikal is talk about something not known well and not good" - as far as I know, Windows API is documented pretty well. C and C++ are known better and used more then VB. As for "not good" - could you elaborate?

    2. He is obviously trolling you.

    3. That's what I thought initially, but you never know. Remember Albert Einstein saying "Two things are infinite: the Universe and ..." :-)


Note: Only a member of this blog may post a comment.