System Programming: encryption

Source code for this article may be found here.

One of the aspects of software anti RE (reverse engineering) protection is the need to protect sensitive data (for example decryption or license keys, etc.) There is quite a common practice of storing such data in encrypted form and using it by passing to a certain routine for decryption. I am not going to say, that this is not a good idea, but the problem with such approach is - vendors (in most cases) only rely on the complexity of the encryption algorithm, which is not as protective as it is thought to be and too often is placed in a single function (which, potentially, may be ripped and used with malicious intent).

I have already covered the basics of executable code obfuscation in this article, now it's time to take a look at how data may be hidden (this approach may be used with executable code as well) by, for example, putting it on stack and using several separate functions to reconstruct the original data.

The idea of hiding data in uninitialized variables (of which I am going to talk here) is not new at all, but still is rarely used, if at all.

Note for nerds:

This is not a tutorial, neither a howto. This is a basic explanation of the concept (no, this is not my invention and yes, there are other ways). The supplied code may be not perfect. It may contain bugs and is given here as an example only.

Needle and the Haystack

While needle is the data we want to hide, haystack is our whole program. You may hide data anywhere - data section, code section, etc. You may even spread parts of the data throughout the program. In this particular example, the data is pretended to be a part of the key computation algorithm. We will reconstruct the data on the stack (this is thread safe as every thread has its own stack in either way).

As this is (and I will reiterate this) just an example, our program is quite short:

#include <stdio.h>

#include <stdlib.h>

#include <string.h>

#define DATA_SIZE 16

int main(int argc, char** argv)

{

unsigned int key;

char* str;

char* res = (char*)malloc(sizeof(char) * DATA_SIZE);

// Calculate pseudo key

key = CalcKey(0x12345678);

// Mutate the key (get the actual key)

key = Mutate(0);

// Get the pointer to the data

str = GetPtr();

// Decode the data

Decode();

// Copy the data to a buffer

memcpy(res, str, sizeof(char) * DATA_SIZE);

// Print the data (which is actually a string)

puts(res);

return 0;

}

As you may see, there is a set of functions used to construct the hidden data (functions are written in assembler):

unsigned int CalcKey(unsigned int seed);

Uses "seed" to start preparing the decryption key. The value returned by such function should be used somewhere, for any kind of "decryption" operation, just in order to lead the attacker astray. You may say, that sooner or later, this move would be disclosed and the attacker would get back to this point and revise it and you will be right. However, given that "real life" implementation should be more complicated then the following code, it would take a while until the real purpose is discovered. Even more than that, it would still scare away some "hackers".

The following code is the implementation of the CalcKey function used in this example:

calc_key:
push ebp
mov ebp, esp
push edi

sub esp, 0x14 ;this is the amount of bytes we would need
;for data reconstruction
and dword [ebp - 4], 0 ;forming the "key"
dec dword [ebp - 4]
mov eax, [ebp + 8]
xor dword [ebp - 4], eax ;by this line, the real key is half ready
mov eax, [ebp - 4]

lea edi, [ebp - 0x14] ;go on with making the fake key
push eax
mov eax, 0x5A309FC0
stosd
mov eax, 0x617CD6E7
stosd
mov eax, 0x523088E7
stosd
mov eax, 0x365CFAA9
stosd
pop eax

xor eax, dword[ebp - 8]
xor eax, dword[ebp - 12]
xor eax, dword[ebp - 16]
xor eax, dword[ebp - 20] ;pseudo key is ready
;as you see, the return value is the pseudo key
;the real one remains on stack
pop edi
leave
ret

The highlighted constants, which may seem to be a part of the pseudo key calculation are in fact the data. As you can see, we put it on stack and "forget" there. It is important to mention, that you have to be careful if you decide to use stack for this purpose, and make sure the data is not being overwritten by subsequent calls to other functions. In order to make sure this does not happen, the suggestion is to put the actual data further into the stack (e.g. at [ebp - 0x100] instead of [ebp - 0x14] or even further).

I would say it again - make use of the pseudo key somewhere.

unsigned int Mutate(unsigned int dummy);

"dummy" is a dummy parameter and my personal suggestion is to do some manipulations with it. This function may seem as the one that produces different keys derived from the pseudo key computed by CalcKey depending on the "dummy" parameter. Well, it does. But those keys are not used in this example. What it does in deed, is mutating the half generated key, which is still present on stack where we left it in the CalcKey function (if it is not - check your code), and finalizing the key generation process.

mutate:
push ebp
mov ebp, esp

mov eax, [ebp - 4]
rol dword[ebp - 4], 1
xor dword[ebp - 4], eax ;finalize real key computation

xor eax, [ebp + 8] ;make use of "dummy" parameter

leave
ret

Once this function returns, we have a ready to use key somewhere in the stack space. A small note to satisfy nerds (as others should know this by default) - you should not call these functions one after the other in real life.

unsigned char* GetPtr(void);

This is the most simple (meaning short) function. All it does - returns a pointer to the location of data inside the stack area.

get_ptr:
push ebp
mov ebp, esp
sub esp, 0x14
mov eax, esp
leave
ret

In case of this example, the GetPtr() function returns the pointer itself, however, you may make it return any value that allows you to form a real pointer to the real data. Another recommendation is to call this function before the data gets decrypted so that it may be considered a pointer to immediate.

void Decode(void);

Finally, the end of this "complicated" procedure - decoding the actual data with the actual key.

decode:
push ebp
mov ebp, esp

mov eax, [ebp - 4] ;remember? the real key should still be here
xor [ebp - 8], eax ;decode the data
xor [ebp - 12], eax
xor [ebp - 16], eax
xor [ebp - 20], eax

leave
ret

Upon return from this function, the pointer obtained with GetPtr() would point to the decrypted data which is still on stack. Suggestion is - move it from there and overwrite that stack area with whatever you want.

Compiling and running the attached code would print the famous "Hello, World!" string to the terminal.

Hope I managed to explain the idea and that you may find this article interesting.

See you at the next!

Source code for this article may be found here.

There has been said and written too much on how software vendors do not protect their products, so let me skip this. Instead, in this article, I would like to concentrate on those relatively easy steps, which software vendors have to take in order to enhance their protection (using packers and protectors is good, but certainly not enough) by not letting the whole code appear in memory in readable form for a single moment.

Attack Vectors

Prior to dealing with "why attackers are able to x, y, z" let us map most frequent attack vectors in ascending order of their complexity.

Static Analysis - inspecting an executable in your favorite disassembler. It may be hard to believe, but majority of software products out there are vulnerable to static analysis, thus, showing us, that most of vendors do not care about proprietary algorithms' safety in addition to the fact, that they seem not to care about piracy either (but they tend to cry about it all the time).

Dynamic Analysis - running an executable inside your favorite debugger. This is a direct consequence of the previous paragraph. If an attacker is able to see the whole code in the disassembler - he/she definitely can run it in a debugger (even if this requires some minor patching).

Static Patching - this means changing the code located in the file of the executable. It may be changing one jump or adding a couple of dozens of bytes of attacker's own code in order to alter the way the program runs.

Dynamic Patching - similar to static patching in the idea behind the method. The only difference is, that dynamic patching is performed while the target executable is loaded into memory.

Dumping - saving the data in memory to a file on disk. This method may be very useful when examining a packed executable. Such memory dumps may be easily loaded into, for example, IDA and examined as if that was a regular executable (some additional actions may be required for better convenience, like rebasing the program or adjusting references to other modules).

In most cases, at least two of the aforementioned vectors would be present in time of attack.

Packers and Cryptors

Using different packers, cryptors and protectors is quite a known practice among software vendors. The problem with this is, that few of them go beyond packing the code in file and fully unpacking it in memory and, sometimes, protecting the packer itself. By saying "go beyond" I mean any implementation of anti debugging methods of any kind. Besides, such utilities do not prevent an attacker from obtaining a memory dump good enough to deal with. One or two check the consistency of the code, which may (yes - may, as it not necessarily can) prevent patching the code, but every wall has a door and it only matters how much effort opening that door may require. Bottom line is, that these types of protection may only be useful in preventing static analysis, but only if there is no relevant unpacker or decrypter.

Protectors

This is "the next step" in the evolution of packers. These provide a bit more options and tools to estimate how secure the environment is. In addition to packing the code, they also utilize code consistency checks, anti debugging tricks, license verification, etc. Protectors are good countermeasures to the first three (or even four) attack vectors. However, even if certain protector has some anti patching heuristics, it is only good as long as it (heuristics) is not reversed and either patched or fooled in any other way.

Despite all the "good" in protectors, even such powerful tools are not able to do much in order to prevent an attacker from obtaining a memory dump, which may be obtained by either using ReadProcessMemory or injecting a DLL and dumping "from inside" while suspending all other threads.

Anything Else?

Yes, there are some basic protections provided by the operating system, like session separation, for example, which prevents creation of remote threads (used with DLL injection), but those are hardly worth even mentioning here.

The picture drawn here appears to be sad and hopeless enough. However, there are several good methods to add more protection to a software product and more pain in some parts of the body to attackers.

Code Obfuscation

While this methods is widely used by protectors and, sometimes, by packers and cryptors (unfortunately, in most cases, for protecting themselves only) it seems to be almost totally unknown to the rest of software vendors. In my opinion, branching the code more than it is usually needed may not be considered as code obfuscation, it may rather be called an attempt to obfuscate an algorithm. The situation is such, that even implementation of something similar to this would be a significant improvement in vendors' efforts to protect their products.

Hiding the Code

Software vendors repeatedly fail at understanding two facts - popular means more vulnerable (in regard of commercial solutions) and the fact that there is no magic cure and they have to put some additional effort into protecting their products.

One of the options, which I would like to cover here, is dynamic encryption of executable code. This method promises that only certain parts of the code would be present in memory in readable (possible to disassemble) form, while the rest of the code (and preferably data) is encrypted.

I am still sure - the best way to explain something is explanation by example. The small piece of C code described below is intended to show the principle of dynamic code encryption. It contains several functions in addition to main - the first is the one (the target) we are going to protect. It does nothing special, just calculates the factorial of 10 and prints it out. The main function invokes a decrypter in order to decrypt the target, calls the target (thus, displaying the factorial of 10) and, finally invokes cryptor to encrypt the target back (hide it).

The code may be compiled for both Linux (using gcc) or Windows (using mingw32). It uses obfuscation code from here.

Target Function
Our target function is quite simple (it only calculates factorial for hardcoded number):

void func()
{
__asm__ __volatile__("enc_start:");
{ /* Braces are used here as we do not want IDA to track parameters */
int i, f = 10;
for( i = 9; i > 0; i--)
f *= i;
printf("10! = %d\n\n", f);
}
__asm__ __volatile__("enc_end:");
}

You noticed the labels in the beginning and in the end of the function body? These labels are only used for getting the start address of the region to be decrypted/encrypted and calculating it's length. Due to the fact that these labels are no processed by the C preprocessor, but are passed to assembler, they are accessible from other functions by default. The rest of the code is enclosed by braces in order to put all the actions related to variables i and f in the encrypted part of the function. This is what it looks like, before being decrypted:

Although, in attached code, the initial encryption is performed upon program start, in reality, it should be done with, probably, a third party tool. You would only have to put some unique marking at the start and end of the region you want to encrypt. For example:

__asm__(".byte 0x0D, 0xF0, 0xAD, 0xDE");
void func()
{
...
}
__asm__(".byte 0xAD, 0xDE, 0xAD, 0xDE");

Encryption Algorithm
Selection of encryption algorithm is totally up to you. In this particular case, the algorithm is quite primitive (it does not even require a key):

b - byte
i - position
for i = 0; i < length; i++
b(i+1) = b(i+1) xor (b(i) rol 1)
b(0) = b(0) xor (b(length) rol 1)

Execution Flow
So, let us assume that the program started with the function already encrypted. As this is just an example, we can get to the business right away:

int main()
{
unsigned int addr, len;
__asm__ __volatile__("movl $enc_start, %0\n\t"\
"movl $enc_end, %1\n\t"\
: "=r"(addr), "=r"(len));
len -= addr;
decode(addr, len);
func();
encode(addr, len);
return 0;
}

The code above is self explanatory enough. There are, however, a couple of things needed to be mentioned. decode and encode functions should take care of modifying the access rights of the memory region they are going to operate on. The following code may be used:

#ifdef WIN32
#include <windows.h>
#define SETRWX(addr, len) {\
DWORD attr;\
VirtualProtect((LPVOID)((addr) &~ 0xFFF),\
(len) + ((addr) - ((addr) &~ 0xFFF)),\
PAGE_EXECUTE_READWRITE,\
&attr);\
}
#define SETROX(addr, len) {\
DWORD attr;\
VirtualProtect((LPVOID)((addr) &~ 0xFFF),\
(len) + ((addr) - ((addr) &~ 0xFFF)),\
PAGE_EXECUTE_READ,\
&attr);\
}
#else
#include <sys/mman.h>
#define SETRWX(addr, len) mprotect((void*)((addr) &~ 0xFFF),\
(len) + ((addr) - ((addr) &~ 0xFFF)),\
PROT_READ | PROT_EXEC | PROT_WRITE)
#define SETROX(addr, len) mprotect((void*)((addr) &~ 0xFFF),\
(len) + ((addr) - ((addr) &~ 0xFFF)),\
PROT_READ | PROT_EXEC)
#endif

This is the only platform dependent code in this sample.

Bottom Line
The example given above is really a simple one. Things would be at least a bit more complicated in real life. While there is only one encrypted function, imagine, that there are several encrypted functions. Some of them are encrypted without keys (like the one above) others require keys of different complexity. Several keys may be hardcoded (for those parts that were encrypted in order to draw attacker's attention away from the "real" thing), others should be computed on the fly.

Example:
Function A is encrypted without a key. When decrypted, it performs several operations and decrypts function B, which, in turn encrypts function A back and calculates a key for function C based on the binary content of function A (or A and B to prevent breakpoints) or even based on some other code in unrelated place.

Of course, there is no such thing as unbreakable protection. But the time it takes to break certain protection makes the difference. A company that produces software product which is cracked the next day may hardly benefit from all the hard work. On the other hand, it is totally possible to create protection schemes that would require months to be cracked.

I will try and cover additional possibilities and aspects of software protection in my future posts in a hope to at least try to change the situation.

Hope this post was helpful.
See you at the next!

System Programming

Search This Blog

Thursday, May 17, 2012

Basics of Data Obfuscation

Friday, March 2, 2012

Dynamic Code Encryption as an Anti Dump and Anti Reverse Engineering measure