System Programming: Basics of Data Obfuscation

Source code for this article may be found here.

One of the aspects of software anti RE (reverse engineering) protection is the need to protect sensitive data (for example decryption or license keys, etc.) There is quite a common practice of storing such data in encrypted form and using it by passing to a certain routine for decryption. I am not going to say, that this is not a good idea, but the problem with such approach is - vendors (in most cases) only rely on the complexity of the encryption algorithm, which is not as protective as it is thought to be and too often is placed in a single function (which, potentially, may be ripped and used with malicious intent).

I have already covered the basics of executable code obfuscation in this article, now it's time to take a look at how data may be hidden (this approach may be used with executable code as well) by, for example, putting it on stack and using several separate functions to reconstruct the original data.

The idea of hiding data in uninitialized variables (of which I am going to talk here) is not new at all, but still is rarely used, if at all.

Note for nerds:

This is not a tutorial, neither a howto. This is a basic explanation of the concept (no, this is not my invention and yes, there are other ways). The supplied code may be not perfect. It may contain bugs and is given here as an example only.

Needle and the Haystack

While needle is the data we want to hide, haystack is our whole program. You may hide data anywhere - data section, code section, etc. You may even spread parts of the data throughout the program. In this particular example, the data is pretended to be a part of the key computation algorithm. We will reconstruct the data on the stack (this is thread safe as every thread has its own stack in either way).

As this is (and I will reiterate this) just an example, our program is quite short:

#include <stdio.h>

#include <stdlib.h>

#include <string.h>

#define DATA_SIZE 16

int main(int argc, char** argv)

{

unsigned int key;

char* str;

char* res = (char*)malloc(sizeof(char) * DATA_SIZE);

// Calculate pseudo key

key = CalcKey(0x12345678);

// Mutate the key (get the actual key)

key = Mutate(0);

// Get the pointer to the data

str = GetPtr();

// Decode the data

Decode();

// Copy the data to a buffer

memcpy(res, str, sizeof(char) * DATA_SIZE);

// Print the data (which is actually a string)

puts(res);

return 0;

}

As you may see, there is a set of functions used to construct the hidden data (functions are written in assembler):

unsigned int CalcKey(unsigned int seed);

Uses "seed" to start preparing the decryption key. The value returned by such function should be used somewhere, for any kind of "decryption" operation, just in order to lead the attacker astray. You may say, that sooner or later, this move would be disclosed and the attacker would get back to this point and revise it and you will be right. However, given that "real life" implementation should be more complicated then the following code, it would take a while until the real purpose is discovered. Even more than that, it would still scare away some "hackers".

The following code is the implementation of the CalcKey function used in this example:

calc_key:
push ebp
mov ebp, esp
push edi

sub esp, 0x14 ;this is the amount of bytes we would need
;for data reconstruction
and dword [ebp - 4], 0 ;forming the "key"
dec dword [ebp - 4]
mov eax, [ebp + 8]
xor dword [ebp - 4], eax ;by this line, the real key is half ready
mov eax, [ebp - 4]

lea edi, [ebp - 0x14] ;go on with making the fake key
push eax
mov eax, 0x5A309FC0
stosd
mov eax, 0x617CD6E7
stosd
mov eax, 0x523088E7
stosd
mov eax, 0x365CFAA9
stosd
pop eax

xor eax, dword[ebp - 8]
xor eax, dword[ebp - 12]
xor eax, dword[ebp - 16]
xor eax, dword[ebp - 20] ;pseudo key is ready
;as you see, the return value is the pseudo key
;the real one remains on stack
pop edi
leave
ret

The highlighted constants, which may seem to be a part of the pseudo key calculation are in fact the data. As you can see, we put it on stack and "forget" there. It is important to mention, that you have to be careful if you decide to use stack for this purpose, and make sure the data is not being overwritten by subsequent calls to other functions. In order to make sure this does not happen, the suggestion is to put the actual data further into the stack (e.g. at [ebp - 0x100] instead of [ebp - 0x14] or even further).

I would say it again - make use of the pseudo key somewhere.

unsigned int Mutate(unsigned int dummy);

"dummy" is a dummy parameter and my personal suggestion is to do some manipulations with it. This function may seem as the one that produces different keys derived from the pseudo key computed by CalcKey depending on the "dummy" parameter. Well, it does. But those keys are not used in this example. What it does in deed, is mutating the half generated key, which is still present on stack where we left it in the CalcKey function (if it is not - check your code), and finalizing the key generation process.

mutate:
push ebp
mov ebp, esp

mov eax, [ebp - 4]
rol dword[ebp - 4], 1
xor dword[ebp - 4], eax ;finalize real key computation

xor eax, [ebp + 8] ;make use of "dummy" parameter

leave
ret

Once this function returns, we have a ready to use key somewhere in the stack space. A small note to satisfy nerds (as others should know this by default) - you should not call these functions one after the other in real life.

unsigned char* GetPtr(void);

This is the most simple (meaning short) function. All it does - returns a pointer to the location of data inside the stack area.

get_ptr:
push ebp
mov ebp, esp
sub esp, 0x14
mov eax, esp
leave
ret

In case of this example, the GetPtr() function returns the pointer itself, however, you may make it return any value that allows you to form a real pointer to the real data. Another recommendation is to call this function before the data gets decrypted so that it may be considered a pointer to immediate.

void Decode(void);

Finally, the end of this "complicated" procedure - decoding the actual data with the actual key.

decode:
push ebp
mov ebp, esp

mov eax, [ebp - 4] ;remember? the real key should still be here
xor [ebp - 8], eax ;decode the data
xor [ebp - 12], eax
xor [ebp - 16], eax
xor [ebp - 20], eax

leave
ret

Upon return from this function, the pointer obtained with GetPtr() would point to the decrypted data which is still on stack. Suggestion is - move it from there and overwrite that stack area with whatever you want.

Compiling and running the attached code would print the famous "Hello, World!" string to the terminal.

Hope I managed to explain the idea and that you may find this article interesting.

See you at the next!

17 comments:

AnonymousAugust 9, 2012 at 12:45 PM
Here's a link that I found on Google that gives a pretty good primer for reverse engineering and techniques to combat it. http://www.codeproject.com/Articles/30815/An-Anti-Reverse-Engineering-Guide
Miky GonzalezJanuary 10, 2013 at 5:37 PM
Can You put code for desencrypt datas? It is assumed that the encrypted data is 0x5A309FC0 ... 0x365CFAA9; After doing them 0x12345678 XOR with the key, you could add code to encrypt data?. Sorry my bad English. Regards
Miky GonzalezJanuary 10, 2013 at 9:53 PM
... But... That operations make with the key 0x12345678?

Note: Only a member of this blog may post a comment.

System Programming

Search This Blog

Thursday, May 17, 2012

Basics of Data Obfuscation

17 comments: