Monday, February 27, 2012

Basics of Executable Code Obfuscation

Source code for this article may be found here.

The problem of software security has already been raised in my previous articles more that once. This article is not an exception. 

Majority of software vendors position themselves as number one in the industry, even though there are always more then 1 number 1. But what unites them all (well, almost all) in reality, it the fact that they all suffer from piracy, they all are aware of that and the last one - they all do almost nothing to change the situation. It surely impossible to totally defeat software piracy but it is definitely realistic, to make the "pirating" process a pain in the neck for pirates.

In this article, I would like to cover the basics of executable code obfuscation - a relatively simple technique, which is, unfortunately, rarely utilized by software vendors as they mostly rely on out of the box solutions. Those of you who have read my previous articles know what my attitude towards those solutions is (roughly saying - more public - more vulnerable), but let me make a  bit different statement here - no matter how good and complex the out of the box solution is, most of software vendors mistakenly assume, that the presence of a well known (for its complexity) software protection package alone is enough, without even trying to utilize it properly. There are cases, however, when proper utilization of third party protection tools is not possible (then why use them at all?) due to negative impact of program's execution (by either consuming too much time or causing faults of different kind).

The lack of ability to protect software product brings forth much more problems then a trivial software piracy. How about proprietary algorithms? The question I always want to ask software vendors is - do you really think that listing patents and copyright notice in the "About" section is enough to push an attacker back?

There are numerous methods of executable code protection, starting with static/dynamic encryption and up to complex virtual machines. However, this article is intended to briefly cover the most basic and at the same time so rare (in the world of legitimate software) method as executable code obfuscation. Although, it is hard to believe that this would really stop someone with modern reverse engineering tools from compromising your software product, but it would definitely make the process of static analysis and algorithm restoration a lot harder. In addition, it would draw attacker's attention away from other protections utilized (at least for some time, depending on attacker's skills).

Executable Code Obfuscation
This technique is intended to prevent static analysis of the executable code and reduce the possibility of algorithm restoration. Obfuscated code would initially appear as nonsense at first glance:

But at first glance only. Take a closer look (this is the beginning if a main function) and you would see that sub_401421 is not a separate function, but belongs to the previous code - it calculates the address of the next instruction to be executed, pushes it onto the stack as if it was a return address and executes ret performing jump.

This is all nice, but let's get to a simple example - the best way to understand how it works and, which is more important, how it may be implemented.

Trivial Example
As it comes from the title of this paragraph, the example code I want to show here is not intended to perform any meaningful task. It simply prints out "Hello!!!" several times, tests an obfuscated call to a library function "printf" and exits. The code is 32 bits, runs on both Linux and Windows and may be compiled with either GCC or MinGW C compilers. You would have to adjust it a bit for usage with MSVC, however. It contains three functions:
  1. constructor - responsible for encryption of addresses of functions that are going to be called by other parts of the code (which are "function #2" and "printf";
  2. my_func - this is the "function #2'; it actually prints out the "Hello!!!" string and accepts an amount of times to print the string and a pointer to the string to be printed;
  3. main - well, this is the main function.

There are a couple of steps to be taken prior to writing the code itself - setup the addresses of the functions that are going to be used.

Declare three global variables of type unsigned int:

unsigned int  addr;    // Address of "my_func" function
unsigned int  printer; // Address of "printf" function
unsigned int  _mask;   // XOR mask for encryption of addresses

Next step is to implement constructor function (see this post about constructor functions in MSVC):

Getting it Done
Write the "my_func" function:

call2 is a macro that performs an obfuscated call and accepts the following parameters:
  • address - the address of the function to call;
  • param1 - first parameter of the callee;
  • param2 - second parameter of the callee.
Here is the definition of the call2 macro:

Those of you using Microsoft's C compiler (e.g. Microsoft Visual C++) should rewrite the Assembly code according to Intel syntax. Use this cheat sheet if you are not familiar with AT&T Assembly syntax.

The main function in this example is quite simple but it seems to illustrate the concept:

The code above is self explanatory enough. 
Put all code together, build and run. This is what the output looks like:

Quite simple, isn't it? Take a look at the compiled result in IDA Pro, for example, and see that it has become harder to read and, even more important, to understand the code.

For nerds: yes, you have to include stdio.h header file.

Taking it Further
Just as I mentioned above, this is a trivial example, neither a tutorial nor instructions to follow. It is obvious, that obfuscation macros in production code should be at least a bit more complicated, then what is shown here.

You may also add some macros with "junk" code in order to obfuscate other parts (not related to function calls) of your code. Like this, for example:

Despite the simplicity, even such trivial macros can convert readable code into something like this (main starts at loc_4013FB):

Thanks for reading! Hope this post was helpful. 
See you at the next!


  1. Please provide a link to the C,C++ source code? eg. on GitHub gist

    1. Actually, all the code is given here. But, anyway, I will upload my sources and put a link in the beginning of the article.

    I googled this because I was trying to crack a program, and it used something like this :)
    Thanks! You made it much more simple to reverse engineer for me!

    1. Good to know, that people are getting here via google.
      However, I am afraid - you misunderstand the term Open Source. Open Source does not mean "cracked binary". As to reversing - one thing is to reverse something in order to see how it works and improve your skills, another thing is to "crack" something to make it "free"(by the way, "cracking" is not the same as "hacking"), which is more like theft.

  3. This comment has been removed by a blog administrator.

    1. Let's not use this comments section for promotion of other resources.


Note: Only a member of this blog may post a comment.