Tuesday, October 4, 2011

Stealth Import of Windows API

At good old times, memory was an expensive resource and developers had to take care of the size of the programs they create. Imagine how hard they had to work before there were high level languages (like C), before compilers became smart enough to handle all size optimization issues. Speed was also among the concerns, as the hardware was not as fast as it is now. Another headache was the need to interact with the underlying operating system. Or, to be more precise, the need to implement the interfaces (at pre-libc times). Modern operating systems provide a built-in mechanism for that. This mechanism is called API - Application Programming Interface. This mechanism is a blessing and curse in one. On one hand it greatly simplifies the interaction with the OS, on the other hand it just makes your software more vulnerable to hackers and/or malware. In some cases the usage of APIs just gets exaggerated.

Let us check that with a simple example - the well known "Hello World" application in C created with Microsoft Visual C++ 2010 Express. The size of the executable is 27 kilobytes. The image of the executable has 7 sections while it could well be implemented with only two sections (one for code and one for data). 

The import directory looks even more exciting. Well, MSVCRT.dll is unavoidable as it is the C language interface to Windows Operating System. But there are 28 imported APIs from KERNEL32.dll and most of them seem to be placed here by mistake. GetTickCount for example. Do we care about timing when we only want to output a single string and leave? No, we do not.

Anyway, the issue of compiler's heuristic is outside the scope of this article. Let's concentrate on the API functions. In general, it is a great thing that lets us deal with application development without a need to implement every single interaction with Operating System and saves us a lot of time. Good on one hand, but bad on the other. Having the list of API functions for certain software provides a clear understanding of what, and what is even more important, how that software is intended to do. This may be good when you deal with malware research, but not as good when you are trying to protect your legitimate software from being hacked.

Unfortunately, there are thousands of software products that use IsDebuggerPersent API as their only protection mechanism. Isn't this ridiculous? No, it is not. It is rather sad, I'd say. Of course, there are numerous packers/cryptors/protectors out there, but the problem is that the more known your solution is, the more vulnerable it gets. There are some linkers that provide you with import section obfuscation abilities, but again, the problem is that they are known.

One of the possible solutions for this problem is the Stealth Import of APIs. This is a simple, powerful but underestimated technique. There are many developers, most of the developers, I should say, who believe that it is impossible to create and, even more important, launch a Windows executable without imports at all. "You need to import KERNEL32.dll at least!" - they would say. Unfortunately, not all of us are aware of the fact that both NTDLL.dll and KERNEL32.dll are automatically mapped into the process's address space regardless of the executable's import table. It is obvious, that having them loaded in memory, makes it possible to locate any API function and load any library should there be a need for it.  We may not know, but the operating system itself provides us with all the tools we may need for that. 

Get Handle of KERNEL32.DLL or NTDLL.DLL
Normally, if we need to get a handle to certain module that we know is loaded in memory, we simply call the GetModuleHandle API function or LoadLibrary in those cases where the module is not loaded in memory. But this is not the normal situation. We do not have access to those functions (yet). What should we do? The answer is simple - SEH or Structured Exception Handling mechanism (remember we said that the OS provides us with everything?).

All we need to do, is to get the address of the first exception handler in the chain of handlers. This chain is accessible through the first entry in the TIB (Thread Information Block) which is pointed by [FS:0]. This is as simple as

   ;Get the initial exception handler
   mov eax,[fs:0]

We now have the pointer to the last added EXCEPTION_REGISTRATION record and only need to iterate through the rest of the records in order to get to the first record which normally points to either KERNEL32.DLL or NTDLL.DLL (on Windows Vista, 7). The following code does exactly this:

   cmp dword[eax],0xFFFFFFFF
   jz .found_default_handler
   ;go to the previous handler
   mov eax,[eax]
   jmp .search_default_handler 

The last (or to say it right - the first) record would have its prev field equal to -1 and its handler field ([eax+4] in our case) contains the address of the default exception handler located in one of the dlls mentioned above. What's next? 

Things are really easy if we are on Windows XP, as we have an address inside the KERNEL32.DLL and all we have to do is make it page aligned 

   mov eax,[eax+4]
   and eax,0xFFFF0000

then "scroll" the pages towards lower addresses and check each page for 'MZ' signature

   cmp word [eax],'MZ'
   jz .got_mz
   sub eax,0x10000
   jmp .look_for_mz

Once we find the 'MZ' signature, we have the handle to the library. My advice - save it somewhere. The problem is - we still do not know which library this is (KERNEL32.DLL or NTDLL.DLL). In normal situation we would call GetModuleFileName, but, again, we are not in a normal situation. The solution is easy. Having the base address of the module we already have everything we may need. Does offset 0x3C look familiar? If not, then you should probably read this document. At this offset from the base address we have a WORD which is the offset of the PE signature ('PE\0\0') which is followed by the COFF header. We should check it anyway

   mov bx,[eax+0x3C]
   movzx ebx,bx
   add eax,ebx
   mov bx,'PE'
   movzx ebx,bx
   cmp [eax],ebx
   jz .found_pe

There is not much we can do if the zero flag is not set after comparison, which means that the PE signature has not been found. We just need to restore the stack to the state it was in when the process started, zero the eax register and execute ret instruction. This would terminate our process. Basically, this would mean that we have to review our previous code. On the other hand, if zero flag is set, this mean that we have successfully got to the COFF header. 

Now we need to locate the export directory. Its RVA (relative virtual address) and size should appear right after the Optional Header, which, in turn, appears right after the COFF header. We may skip the headers themselves and get straight to the export IMAGE_DATA_DIRECTORY entry

   add eax,0x78

Yes, as simple as that. Now [eax] points to the RVA of the export directory and [eax+4] to its size

typedef struct _IMAGE_DATA_DIRECTORY
   DWORD RVA;     //EAX points here
   DWORD Size;

The next step is to read the RVA of the export table and add it to the image base address (the handle we obtained earlier)

   mov eax,[eax]
   add eax,[image_base_address]

Conrgatulations! We are finally at the Export Directory Table! 

Right now, we are interested in particular field of this table, namely the "Name RVA" which is located at [eax+0x0C] and points to NULL terminated ASCII string containing the name of this very library. The procedure is almost identical to the previous one

   mov eax,[eax+0x0C]
   add eax,[image_base_address]

We are one step from knowing what library this is. However, we have to implement a simple strcasecmp function. Why strcasecmp instead of strcmp? Just because strcasecmp is case-insensitive and we do not have to guess whether the library name is upper or lower, or even mixed case (like "KERNEL32.dll"). By respectively comparing the library name with strings 'kernel32.dll' and 'ntdll.dll' we identify the library.

We are in KERNEL32.DLL!
If this is the case, then we are lucky as we only have to locate the address of GetProcAddress API by parsing the export table (this deserves a separate article) or we may still use our custom version of GetProcAddress. We are able to obtain addresses of any API that is exported by KERNEL32.DLL as we have its handle. More than that, we are able to load additional libraries by first locating the LoadLibraryA or LoadLibraryW addresses. Basically, we are done.

We are in NTDLL.DLL...
This case is less desired but it may occur if our software runs on Windows Vista and higher. Needless to say that we have no access to GetProcAddress or LoadLibraryA (yet!).  Instead we have LdrLoadDll or LdrGetDllHandle API functions exported by NTDLL.DLL. Here are the prototypes of these functions:

     IN PWCHAR          PathToFile OPTIONAL,
     IN ULONG           Flags OPTIONAL,
     IN PUNICODE_STRING ModuleFileName,
     OUT PHANDLE        ModuleHandle);

Let's skip the optional values as we may safely set them to 0. The first non optional parameter is ModuleFileName, but what does PUNICODE_STRING mean? It is a pointer to a structure, that describes a UNICODE string. This structure may be easily build on stack. Here is its declaration:

typedef struct _LSA_UNICODE_STRING
   USHORT Length;
   USHORT MaximumLength;
   PWSTR  Buffer;

Length - this field specifies the length, in bytes, of the string pointed by Buffer, not including the terminatinf NULL;
MaximumLength - total size, in bytes, of the memory allocated for Buffer;
Buffer - pointer to a wide-character string (like 'K', 0, 'E', 0, 'R', 0, 'N', 0, 'E', 0, 'L', 0, '3', 0, '2', 0, '.', 0, 'D', 0, 'L', 0, 'L', 0, 0, 0).

The PHANDLE ModuleHandle is the pointer to a location in memory where the function should store the handle to a loaded library.

Now, let's turn to LdrGetDllHandle

     IN PWORD           pwPath OPTIONAL,
     IN PVOID           Unused OPTIONAL,
     IN PUNICODE_STRING ModuleFileName,
     OUT PHANDLE        pHModule); 

Let's skip the optional parameters again. Especially the "Unused" one.
ModuleFileName - is a pointer to UNICODE_STRING structure which describes the name of the DLL;
pHModule - a pointer to a location in memory where the function should store the result (the handle of the DLL).

We still have to implement a custom GetProcAddress function in order to retrieve these. The sample code is located at the end of this article. 

Once we have the addresses of these functions, we should first try to get the module handle of the 'KERNEL32.dll' by calling the LdrGetDllHandle and if it fails, we then try to load it with LdrLoadDll. If both functions fail - restore the stack, execute ret and check your code.

Once we have the module handle of the KERNEL32.DLL, we are free to use the API functions it exports (e.g. GetProcAddress, LoadLibrary, etc.).

As you can see, this technique is simple in deed. More than that it allows you to implement additional protection mechanisms like code obfuscation, SEH usage and many more in order to protect one of the most hack-sensitive parts of your software - the import section.

Hope this post was helpful. See you at the next post!

P.S. Custom GetProcAddress function. It is far from being perfect but is enough for what we need it.

;This is our custom GetProcAddress
;get_proc_address(HMODULE hModule, PCSTR procName)

if used _get_proc_address
baseAddress=         -4
numberNamePointers=  -8
namePointerVA=       -16
ordinalTableVA=      -20
ordinalBase=         -24

        push ebp
        mov ebp, esp
        sub esp,24
        push ebx ecx edx esi edi ebx
        mov esi,[ebp+8]                 ;ESI -> base address
        mov ebx,esi                     ;EBX is going to point to export table
        push ebx
        mov bx,[ebx+0x3C]
        movzx ebx,bx
        add ebx,[esp]
        add esp,4

        ;Set variables
        mov [ebp+baseAddress],esi
       add ebx,0x78                 ;now EBX points to the export table directory entry
        mov ebx,[ebx]
        add ebx,[ebp+baseAddress]
        mov eax,[ebx+24]                ;number of name pointers
        dec eax                         ;This is done in order to compare 0 based index
        mov [ebp+numberNamePointers],eax
        mov eax,[ebx+16]
        mov [ebp+ordinalBase],eax       ;ordinal base
        mov eax,[ebx+28]
        add eax,[ebp+baseAddress]
        mov [ebp+exportAddressTableVA],eax       ;VA of address table
        mov eax,[ebx+32]
        add eax,[ebp+baseAddress]
        mov [ebp+namePointerVA],eax     ;VA of name pointers table
        mov eax,[ebx+36]
        add eax,[ebp+baseAddress]
        mov [ebp+ordinalTableVA],eax    ;VA of ordinal table

        ;Reset offset counter
        xor ecx,ecx
        push ecx
        shl ecx,2                       ;Offset must be multiple of 4, so we multiply counter by 4
        mov ebx,[ebp+namePointerVA]
        add ebx,ecx                      ;EBX now points to one of the exported API functions name pointer
        mov ebx,[ebx]
        add ebx,[ebp+baseAddress]
        push ebx dword[ebp+12]
        call _strcmp
        test eax,1
        jnz .found_api_name
        pop ecx
        cmp ecx,[ebp+numberNamePointers]
        jz .not_found_a_thing
        inc ecx
        jmp .search_loop

        pop ecx
        ;We now have the offset of the api in ECX register
        mov ebx,[ebp+ordinalTableVA]
        shl ecx,1
        add ebx,ecx                     ;EBX now points to the correct ordinal value
        mov bx,[ebx]
        movzx ebx,bx                    ;EBX contains an offset into the export address table
        shl ebx,2                       ;Multiply it by 4
        mov eax,[ebp+exportAddressTableVA]
        add ebx,eax
        mov eax,[ebx]
        add eax,[ebp+baseAddress]       ;now the EAX register contains the address of the exported function

        pop ebx edi esi edx ecx ebx
        mov esp,ebp
        pop ebp
        ret 8

        xor eax,eax
        jmp .out
end if


  1. You can also just get the PEB from FS:[0x30] and use the PEB_LDR_DATA member to walk the loaded modules list. That seems a lot more reliable than reverse-grepping the SEH handlers looking for a module header.

    This also works if you're not the first module in the process. Like, say, an injected DLL.

  2. Honestly speaking - there is no 100% reliable way as PEB_LDR_DATA may be replaced with fake one (does not mean that the default exception handler cannot).
    Also, some nerds may say that PEB stuff is poorly documented.

  3. And by the way, the method described above works from any module.

  4. Dear,
    Very good paper.
    You start in getting the initial exception hander, but what do you think to start by just getting the address of the return to GetProcAddress, it is [ESP] at the starting of a program, then you can as described above, iterate until you find the kernel32.dll address.


    1. It is an option. However, in this particular case, I am not importing any API (including the GetProcAddress).

    2. My bad, i did a mistake, i do not mean GetProcAddress, i would wrote CreateProcess. The return to CreateProcess is in all launched program in [ESP] (needed when program exit and let the hand to windows operating system), then you can starting iterating from this address because it will be always in the address space of kernel32.
      Sorry for the misunderstanding


    3. That may be an option as well. As an option, one may want to parse the PEB and get the kernel32.dll base from there.

      In either way, there's something I missed in the article - all those addresses may be faked by the attacker and developers should be ready for that.

  5. Another way to differentiate between kernel32.dll and ntdll.dll is checking the import table as ntdll.dll has the rva and size set to zero.

    Another thing to add (perhaps off topic) is that in case of WOW64, kernelbase.dll is also loaded along with ntdll and kernel32.

    1. Thanks a lot for this comment!

      You are right about the differentiation. However, I would be careful with the import section as it has more chances of being overwritten.

      As to kernelbase.dll - correct, but it is not related to exception handling (at least it has nothing to do with [FS:0]

  6. Dear Alexey Lyashko,

    Do you have the _strcmp function also ?
    Ive try with this --> http://www.betamaster.us/blog/?p=465

    But it does not work..

    thanks ;)

    1. Hi, I am on a vacation right now, so, sorry, not going to write any code right now:)
      In either way, you've given me a good idea for the next post. Will write on a limited set of library functions.

  7. Dear Alexey Lyashko,

    thank you for your answer :)

    i hope that you enjoy your holiday and have some time to relax and put one and one together :D

    Ill see you soon i hope ;)


    1. You are welcome!

      Will make a post as soon as I return (about two weeks)

  8. Dear Alexey Lyashko,

    I have read your description with a high interest, and tried to develop some code based on it - here is where I encounter a problem:

    "Things are really easy if we are on Windows XP, as we have an address inside the KERNEL32.DLL and all we have to do is make it page aligned

    mov eax, [eax+4]
    and eax, 0xFFFF0000

    then "scroll" the pages towards lower addresses and check each page for 'MZ' signature

    cmp word [eax],'MZ'
    jz .got_mz
    sub eax,0x10000
    jmp .look_for_mz

    I'm trying your approach on Windows 7 x64. While scrolling by 0x10000 bytes I finally get to a memory location where I get an exception: "Access violation reading location 0x77370000." trying to access it to check for the 'MZ'.

    Do you have any idea what I'm doing wrong?

    1. I assume it is a 32 bit process, right?

    2. I have now found the issue. Under MASM I have to use the syntax
      CMP WORD PTR [EAX], "ZM"

      and - then it works like a charm :). My fault!

      Yes, it's a 32-bit process. Would that solution work for 64-bit process too? Or is there an alternative approach needed?

    3. My assumption is that it would work, just mind the size of addresses. But I have never tried it with 64 bit. Need to check this tomorrow.

    4. I've been just searching in Google and it looks the exception handling is done differently under x64: https://www.osronline.com/article.cfm?article=469. I need to dig more deeply into this ;).

      Warm regards

    5. Oh, thanks for this one! Will try to come up with something tomorrow :)


Note: Only a member of this blog may post a comment.