Search This Blog

Showing posts with label suspend thread. Show all posts
Showing posts with label suspend thread. Show all posts

Wednesday, May 30, 2012

CreateRemoteThread. Bypass Windows 7 Session Separation

Internet is full of programmers' forums and those forums are full with questions about CreateRemoteThread Windows API function not working on Windows 7 (when trying to inject a DLL). Those posts made by lucky people, somehow, redirect you to the MSDN page dedicated to this API, which says: "Terminal Services isolates each terminal session by design. Therefore, CreateRemoteThread fails if the target process is in a different session than the calling process." and, basically, means - start the process from your injector as suspended, inject your DLL and then resume the process' main thread. This works... Most of the time... But sometimes you really need to inject your code into a running process. Isn't there a way to do that? Well, there is. As a matter of fact, it is so easy, that I decided not to attach my source code to this article (mainly, because I am too lazy to make it look readable :) ). It appears to be that I am not the only one lazy here :), so I have uploaded the source code.

Let me start as usual, with a note for nerds in order to avoid meaningless comments and stupid discussions. 
The code provided within the article is for example purposes only. Error checks have been omitted on purpose. Yes, there may be another, probably even better, way of doing this. No, manual DLL mapping is not better unless you have plenty of time and nothing to do with it.

All others, let's get to business :)


Opening the Victim Process

This is the easiest part. At this stage you will see whether you are able to inject your code or not (in case of a system process, for example). Nothing unusual here - you simply invoke the good old OpenProcess API

HANDLE WINAPI OpenProcess(
       DWORD dwDesiredAccess, /* in our case PROCESS_ALL_ACCESS */
       BOOL  bInheritHandle, /* no need, so FALSE */
       DWORD dwProcessId /* self explanatory enough */
);

which opens the process specified by dwProcessId and returns a handle to that process, unless, you have no sufficient rights to access that process.


Reading the Shellcode

What you usually see in the examples of shellcode over the internet, is an unsigned char array of hexadecimal values somewhere in the C code. Helps to keep the amount of files smaller, but is not really comfortable to deal with. I decided to store the shellcode in a separate binary file, produced with FASM (Flat Assembler):

use32
   ; offset of the LoadLibraryA address within the shellcode
   dd    func
   ; save all registers
   push  eax ebx ecx edx ebp edi esi
   ; get your EIP
   call  next
next:
   pop   eax
   mov   ebx, eax
   ; get the address of the DLL name
   mov   eax, string - next
   ; do this to avoid possible negative values (due to sign extend)
   movzx eax, al
   add   eax, ebx
   ; pass it to the LoadLibraryA API
   push  eax
   ; get the address of the LoadLibraryA function
   mov   eax, func - next
   movzx eax, al
   add   eax, ebx
   mov   eax, [eax]
   ; call LoadLibraryA
   call  eax
   ; restore registers
   pop   esi edi ebp edx ecx ebx eax
   ; return
   ret
func     dd 0x12345678 ; placeholder for the address
string:

Compiling this code with FASM.EXE will produce a raw binary file, where all offsets are 0 - based. There are some parts in the code above, that may require some additional explanation (for example, why does it not end with ExitThread()). I am aware of this and I will provide you with the explanation a little bit later.

For now, allocate an unsigned char buffer for your shellcode. Make this buffer large enough to contain the shellcode and the name of the DLL (my assumption is, that you passed that name as a command line parameter to your injector). with it's terminating zero.

Once you have read the shellcode into that buffer - append the name of the DLL (which may be a full path to the DLL) to the end of the shellcode with, for example, memcpy() function. Half done with it. Now we still have to "tell" the shellcode where the LoadLibraryA API function is located in memory. Fortunately, the load address randomization in Windows is far from being perfect (addresses  of loaded modules may vary between subsequent reboots, but are the same for all processes). This means that, just as in usual DLL injection, we obtain the address of this API in our process by calling good old GetProcAddress(GetModuleHandleA("kernel32.dll"), "LoadLibraryA") and save it to the "func" variable of the shellcode. Due to the fact that our shellcode may vary in size from time to time (that depends on the needs), we saved the offset to that variable in the first four bytes of the shellcode, which eliminates the need to hardcode the offset. Simply do the following:

*(unsigned int*)(shellcode_ptr + *(int*)(shellcode_ptr)) = (unsigned int)LoadLibraryA_address;

Our shellcode is ready now.


"Create remote thread" without CreateRemoteThread()

As the title of this paragraph suggests - we are not going to use the CreateRemoteThread(). In fact, we are not going to create any thread in the victim process (well, the injected DLL may, but the shellcode won't).


Code Injection

Surely, we need to move our shellcode into the victim process' address space in order to load or library. We are doing it in the same manner, as we would copy the name of the DLL in regular DLL injection procedure:
  1. Allocate memory in the remote process with
    LPVOID WINAPI VirtualAllocEx(
       HANDLE hProcess, /* the handle we obtained with OpenProcess */
       LPVOID lpAddress, /* preferred address; may be NULL */
       SIZE_T dwSize, /* size of the allocation in bytes */
       DWORD  flAllocationType, /* MEM_COMMIT */
       DWORD  flProtect /* PAGE_EXECUTE_READWRITE */
    );
    This function returns the address of the allocation in the address space of the victim process or NULL if it fails.
  2. Copy the shellcode into the buffer we've just allocated in the address space of the victim process:
    BOOL WINAPI WriteProcessMemory(
       HANDLE   hProcess, /* same handle as above */
       LPVOID   lpBaseAddress, /* address of the allocation */
       LPCVOID  lpBuffer, /* address of the local buffer with the shellcode */
       SIZE_T   nSize, /* size of the shellcode together with the appended                                 NULL-terminated string */
  3.    SIZE_T   *lpNumberOfBytesWritten /* if this is zero - check your code */
    );
    If the return value of this function is non zero - we have successfully copied our shellcode into the victim process' address space. It may also be a good idea to check the value returned in the lpNumberOfBytesWritten.

Make It Run
So, we have copied our shell code. The only thing left, is to make it run, but we cannot use the CreateRemoteThread() API... Solution is a bit more complicated.

First of all, we have to suspend all threads of the victim process. In general, suspending only one thread is enough, but, as we cannot know for sure what is going on there, we should suspend them all. There is no specific API that would provide us with the list of threads for a specified process, instead, we have to create a snapshot with CreateToolhelp32Snapshot, which provides us with the list of all currently running threads of all processes running in the system:

HANDLE WINAPI CreateToolhelp32Snapshot(
   DWORD dwFlags, /* TH32CS_SNAPTHREAD = 0x00000004 */
   DWORD th32ProcessID /* in this case may be 0 */
);

This function returns the handle to the snapshot, which contains information on all present threads. Once we have this, we "iterate through the list" with Thread32First and Thread32Next API functions:

BOOL WINAPI Thread32First(
   HANDLE hSnapshot, /* the handle to the snapshot */
   LPTHREADENTRY32 lpte /* pointer to the THREADENTRY32 structure */
);

The Thread32Next has the same prototype as Thread32First.

typedef struct tagTHREADENTRY32{
   DWORD dwSize; /* size of this struct; you have to initialize this field before use */
   DWORD cntUsage; 
   DWORD th32ThreadID; /* use this value to open thread for suspension */
   DWORD th32OwnerProcessID; /* compare this value against the PID of the victim 
                              to filter out threads of other processes */
   LONG  tpBasePri;
   LONG  tpDeltaPri;
   DWORD dwFlags;
} THREADENTRY32, *PTHREADENTRY32;

For each THREADENTRY32 with matching th32OwnerProcessID, open it with OpenThread() and suspend with SuspendThread:

HANDLE WINAPI OpenThread(
   DWORD dwDesiredAccess, /* THREAD_ALL_ACCESS */
   BOOL  bInheritHandle, /* FALSE */
   DWORD dwThreadId /* th32ThreadID field of THREADENTRY32 structure */
);

and

DWORD WINAPI SuspendThread(
   HANDLE hThread, /* Obtained by OpenThread() */
);

Don't forget to CloseHandle(openedThread) :)

Take the first thread, once it is opened (actually, you can do that with any thread that belongs to the victim process) and suspended, and get its CONTEXT (see "Community Additions" here) using the GetThreadContext API:

BOOL WINAPI GetThreadContext(
   HANDLE    hThread, /* handle to the thread */
   LPCONTEXT lpContext /* pointer to the CONTEXT structure */
);

Now, when all the threads of the victim process are suspended, we are may do our job. The idea is to redirect the execution flow of this thread to our shellcode, but make it in such a way, that the shellcode would return to where the suspended thread currently is. This is not a problem at all, as we have the CONTEXT of the thread. The following code does that just fine:

/* "push" current EIP of the thread onto its stack, so that the ret instruction in the shellcode returns the execution flow to this address (which is somewhere in WaitForSingleObject for suspended threads) */
ctx.Esp -= sizeof(unsigned int);
WriteProcessMemory(victimProcessHandle, 
                   (LPVOID)ctx.Esp, 
                   (LPCVOID)&ctx.Eip,
                   sizeof(unsigned int),
                   &bytesWritten);
/* Set the EIP to our injected shellcode; do not forget to skip the first four bytes */
ctx.Eip = remoteAddress + sizeof(unsigned int);

Almost there. All we have to do now, is resume the previously suspended threads in the same manner (iterating with Thread32First and Thread32Next with the same snapshot handle).

Don't forget to close the victim process' handle with CloseHandle() ;)


Shellcode

After all this, the execution flow in the selected thread of the victim process reaches our shellcode, which source code should be pretty clear now. It simply calls the LoadLibraryA() API function with the name/path of the DLL we want to inject.

One important note - it is a bad practice to do anything "serious" inside the DllMain() function. My suggestion is - create a new thread in DllMain() and do all the job there, so that it may return safely.

Hope this article was helpful.

Have fun injecting and see you at the next.




Friday, March 2, 2012

Defeating Packers for Static Analysis of Malicious Code

I doubt whether there is anybody in either AV industry or among reverse engineers who does not know what a software packer is (for those who don't - this article may help). Malware research and reverse engineering forums are full of packers' related questions, descriptions thereof, unpacking suggestions and links to both packers and unpackers. In short - people have been doing a lot of precious work on defeating packers and protectors.

However, for those of us who are not afraid of static analysis, there is an easier way (I'd dare to say "generic") to handle packers and protectors and retrieve the unpacked form of the executable (cannot hold my self from adding a note for nerds: no, this does not include reversing virtual machines like Oreans' one. This is up to you). So, the main problem is obtaining the unpacked version of code as all the rest may be well handled from there. What we actually need is a dump of unpacked executable. There are lots of memory dumping programs, but some protectors "know" how to handle them, therefore, this article explains a simple and short way of obtaining such dump without teasing the implemented protections.

Knock Knock
First of all, we need to, let's say, get into the process. There are at least two ways to do that:

  • Use the OpenProcess Windows API with, preferably, PROCESS_ALL_ACCESS and read/write from/to process' memory.
  • Inject our code into the process' memory space (simply a code injection or a DLL injection).
My preference is the second one as you mostly have more power operating from inside.

There are several ways to inject a DLL into another process, e.g. calling LoadLibraryA as a remote thread in the victim process or even this one (my preference is the second one again). This is in deed the easiest part. My personal suggestion would be to create a suspended process, inject your DLL and then resume the main thread of the created process as this provides you with greater flexibility.

Set the Trap
No, I am not referring to 0xCC (trap to debugger). Trap, in this case, means something that would trigger the dumper embedded into the injected DLL and cause it to dump the unpacked image, for example, patch one of the API functions with a jmp instruction, which would redirect the execution flow to where we want. Be careful with this approach, as your patch may be well "overpatched" by the protection mechanism of the target executable. Let me give you a couple of suggestions: never patch the first bytes of the API (I assume this code is not a production code, so you may let it be bound to your version of Windows); patch as deep as possible - meaning leave the kernel32.dll alone and go further to ntdll.dll where possible. 

For example, if your target executable outputs a string to a console, that may be a good idea to patch either WriteConsoleA or WriteConsoleW API  function. However, it may be an even better idea to go deeper and patch WriteConsoleInternal (Win7) and install your notification jump there. Once that API is called - chances are that the executable has been fully unpacked. As an alternative, you may simply create a new thread in your DLL and Sleep it for several milliseconds (or even seconds) and then dump the memory.

You may perform these actions in the DllMain of your injected DLL, on the other hand, you may create a separate procedure for this, but then you'd have to use this approach or something similar.

Dumper Triggered
No matter how (either by the API patch or our thread) the dumper is triggered. Sure thing - we are not going to dump the whole memory allocated by the process. We just need the image. The easiest way of getting the information on ImageBase and SizeOfImage of the target module (usually the main module of the process) is to find the corresponding entry in PEB (you may want to check the "Hiding Injected DLL in Windows" post to get more information on PEB and related structures). However, it is important to mention that you HAVE to be careful with that, as the information in PEB may be altered by the protection scheme of the victim executable. Having found the base address and the size of the image, just write the content of that memory region to a file (make sure to take note of image's base address if you are dumping DLL). Quite simple, isn't it? Well, not really. You have to check for memory protection of every region you are currently going to dump as it may have either PAGE_WRITE or PAGE_EXECUTE access rights only, meaning that you cannot access it for reading. Once done with this, you may either let the program execution to continue or terminate the process.

In addition, it is strongly recommended to suspend all the threads of the process, except the thread our code is running in.

Using Dump
Nothing's easier - load the dump into IDA Pro and see how good it handles it.

P.S. Suspending/Resuming Threads
Suspending threads is a bit annoying as you have to get the IDs of all the threads currently running in the system, then select those with process ID of your process and suspend then. The same procedure is applicable for resuming suspended threads.

First of all CreateToolhelp32Snapshot (MSDN):

HANDLE WINAPI CreateToolhelp32Snapshot(
              DWORD wdFlags,
              DWORD th32ProcessID
       );

You have to specify TH32_SNAPTHREAD as flags in order to get threads information. If the return value is not NULL, the you may proceed with Thread32First:

BOOL WINAPI Thread32First(
            HANDLE          hSnapshot,
            LPTHREADENTRY32 lpte
            );

followed by subsequent calls to Thread32Next (has the same arguments) until the return value is FALSE.

The functions Thread32First and Thread32Next fill the THREADENTRY32 structure which has the following format:

typedef struct tagTHREADENTRY32
{
   DWORD dwSize; /* Should be set to sizeof(THREADENTRY32) prior 
                    to calling Thread32First */
   DWORD cntUsage;
   DWORD th32ThreadID;
   DWORD th32OwnerProcessID;
   LONG  tpBasePri;
   LONG  tpDeltaPri;
   DWORD dwFlags;
} THREADENTRY32, *PTHREADENTRY32;

Fields of interest for you would be the th32OwnerProcessID and the th32ThreadID. Compare the th32OwnerProcessID with the ID of the process (previously obtained with GetCurrentProcessId()) your code is running in. If those values are equal, then you have to open the thread with:

HANDLE WINAPI OpenThread(
              DWORD dwDesiredAccess, /* Would be THREAD_ALL_ACCESS */
              BOOL  bInheritHandle, /* FALSE */
              DWORD dwThreadId      /* th32ThreadID */
              );

Then suspend the thread with:

DWORD WINAPI SuspendThread( HANDLE hThread );

while passing the handle obtained with OpenThread().
You have to resume threads once you have saved the dump by calling:

DWORD WINAPI ResumeThread( HANDLE hThread );

Don't forget to close each thread handle with CloseHandle.


That's it. Hope this post was helpful (at least I used this method a lot).
See you at the next.