Search This Blog

Showing posts with label data obfuscation. Show all posts
Showing posts with label data obfuscation. Show all posts

Saturday, May 19, 2012

Simple Runtime Framework by Example

Source code for this article may be found here.

These days we are simply surrounded by different software frameworks. Just to name a few: Java, .Net and, actually, many more. Have you ever wondered how those work or have you ever wanted or needed to implement one? In this article, I will cover a simple or even trivial runtime framework.

As usual - note for nerds:
The source code given in this article is for example purposes only. I know that this framework is far from being perfect, therefore, this article is not a howto or tutorial - just an explanation of principle. Error checks are omitted on purpose. You want to implement a real framework - do it yourself, including error checks.

Now, to let's get to business.

Software Framework
Wikipedia gives the following identification for the term "Software Framework" - "A software framework is a universal, reusable software platform used to develop applications, products and solutions. Software Frameworks include support programs, compilers, code libraries, an application programming interface (API) and tool sets that bring together all the different components to enable development of a project or solution". As you can see, software framework is quite a complex thing. However, let's simplify it and see how it basically work.

Figure 1.
Software Framework
The diagram on the left may give you a good understanding of what Software Framework is and what role it performs. Simply saying, it is a shim between the user application and the Operating System. There are at least two types of Software Frameworks:

  1. Application Programming Interface (API) - if we take a look at Windows API, we may see that it is a framework as well. However, it may be bypassed or, at least, a programmer may choose to decrease the interaction with it by, for example, using functions from ntdll.dll instead of those provided by kernel32.dll or even "talk" to Windows kernel directly (highly not recommended, but may be unavoidable some times) through interrupts.
  2. .Net like framework - total isolation of user code from the operating system. Such frameworks are mostly virtual machines totally isolating user application from the operating system and hardware. However, such framework has to provide the application with all the services available in the Operating System. This is type of framework we are going to build in this article.




Virtual Machine
The basics of building a simple virtual machine is covered in this article, so I will only give a brief explanation here. Our VM in this example will consist of the following components:
  1. Virtual CPU
    A structure that represents a CPU - basically, has 6 registers and a pointer to the stack:

    typedef struct
    {
       unsigned int  regs[6];
       unsigned int* stack;
    }CPU;

    The 6 registers are general purpose
    A, B, C and D, where A is also used to store system call return value and C is used as a counter for LOOP instruction, STACK POINTER (SP) and INSTRUCTION POINTER (IP).
  2. Instruction Interpreter
    A function or a set of functions which responsible for interpretation of the pseudo assembly (or call it intermediate assembly language) designed for this virtual machine (in this case 14 instructions).
  3. System Call Handler
    This component provides the means for the user application to interact with the Operating System (in this case 2 system calls:
    sys_write and sys_exit).

Core Function
The name of the function speaks for itself. This is the first function of the framework implementation which gains control. In this particular case, it does not have too many things to do - initialization of the virtual CPU and execution of the command interpreter, until the user application exits (signals the framework to terminate the execution).

Implementation
It is a common practice to implement a framework as a DLL (dynamic link library), for example, mscoree.dll - the core of the .Net framework. I do not see any reason to reinvent the wheel, therefore, this framework will be implemented as a DLL as well.

All is fine, you may say, but how should we pass the compiled pseudo assembly code to the framework? Well, I bet, most of you know how to do that. In case you don't - no worries, just keep reading.

In case of .Net framework (at least as far as I know), the loader identifies a file as a .Net executable, reads in the meta header, and initializes the mscoree.dll appropriately. We will not go through all those complications and will use a regular PE file:


Figure 2.
Customized PE file.

  • PE Header - regular PE Header, no modification needed;
  • Code Section - simply invokes the core function of the framework:

    push pseudo_code_base_address
    call [core]
  • Import Section - regular import section that only imports one function from the framework.dll - framework.core(unsigned int);
  • Data Section - this section contains the actual compiled pseudo assembly code and whatever headers you may come up with, that may instruct the core() function to correctly initialize the application.






Example Executable Source Code
The following is the source code of the example executable. It may be compiled with FASM (Flat Assembler).

include 'win32a.asm' ;we need the 'import' macro
include 'asm.asm'    ;pseudo assembly commands and constants

format PE console
entry start

section '.text' readable executable
start:
   push _base
   call [core_func]

section '.idata' data import writeable
library  framework, 'framework.dll'

import framework,\
   core_func, 'Core'

section '.data' readable writeable
_base:
   loadi A, _base
   loadi B, 0x31
   _add A, B
   loadr B, A
   loadi A, _data.string
   loadi C, _data.string_len
   _call _func
   loadi A, 1
   loadi B, _data.string
   loadi C, _data.str_len
   _int sys_write
   loadi A, 1
   loadi B, _data.msg
   loadi C, _data.msg_len
   _int sys_write
   _int sys_exit


_func:
   ; A = string address
   ; B = key
   ; C = counter
.decode:
   loadr D, A
   xorr D, B
   storr A, D
   loadi D, 4
   _add A, D
   _loop .decode
   _ret



_data:
.string db 'Hello, developer!', 10, 13
.str_len = $-.string
db 0
.string_len = ($-.string)/4
.msg db 'The program will now exit.', 10, 13
.msg_len = $-.msg

;Encrypt one string
load k dword from _base + 0x31
repeat 5
load a dword from _data.string + (% - 1) * 4
a = a xor k
store dword a at _data.string + (% - 1) * 4
end repeat



The code above produces a tiny executable which invokes framework's core() function. Pseudo assembly code simply prints two messages (the first one is decoded prior to being printed). Full sources are attached to this article (see the very first line).

The good thing is that you do not have to start the interpreter and load this executable (or specify it as a command line parameter) - you may simply run this executable, Windows loader will bind it with the framework.dll automatically. The bad thing is that you would, most probably, have to write your own compiler, because writing assembly is fun, dealing with pseudo assembly is fun as well, BUT, only when done for fun. It is not as pleasant when dealing with production code.


Possible uses
Unless you are trying to create a framework that would overcome existing software frameworks, you may use such approach to increase the protection of your applications by, for example, virtualizing cryptography algorithms or any other part of your program which is not essential by means of execution speed, but represents a sensitive intellectual property.

Hope you find this article helpful.

See you at the next!




Thursday, May 17, 2012

Basics of Data Obfuscation

Source code for this article may be found here.

One of the aspects of software anti RE (reverse engineering) protection is the need to protect sensitive data (for example decryption or license keys, etc.) There is quite a common practice of storing such data in encrypted form and using it by passing to a certain routine for decryption. I am not going to say, that this is not a good idea, but the problem with such approach is - vendors (in most cases) only rely on the complexity of the encryption algorithm, which is not as protective as it is thought to be and too often is placed in a single function (which, potentially, may be ripped and used with malicious intent).

I have already covered the basics of executable code obfuscation in this article, now it's time to take a look at how data may be hidden (this approach may be used with executable code as well) by, for example, putting it on stack and using several separate functions to reconstruct the original data.

The idea of hiding data in uninitialized variables (of which I am going to talk here) is not new at all, but still is rarely used, if at all.

Note for nerds: 
This is not a tutorial, neither a howto. This is a basic explanation of the concept (no, this is not my invention and yes, there are other ways). The supplied code may be not perfect. It may contain bugs and is given here as an example only.


Needle and the Haystack

While needle is the data we want to hide, haystack is our whole program. You may hide data anywhere - data section, code section, etc. You may even spread parts of the data throughout the program. In this particular example, the data is pretended to be a part of the key computation algorithm. We will reconstruct the data on the stack (this is thread safe as every thread has its own stack in either way). 

As this is (and I will reiterate this) just an example, our program is quite short:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define DATA_SIZE 16

int main(int argc, char** argv)
{
   unsigned int key;
   char*        str;
   char*        res = (char*)malloc(sizeof(char) * DATA_SIZE);

   // Calculate pseudo key
   key = CalcKey(0x12345678);

   // Mutate the key (get the actual key)
   key = Mutate(0);

   // Get the pointer to the data
   str = GetPtr();

   // Decode the data 
   Decode();

   // Copy the data to a buffer
   memcpy(res, str, sizeof(char) * DATA_SIZE);

   // Print the data (which is actually a string)
   puts(res);

   return 0;
}

As you may see, there is a set of functions used to construct the hidden data (functions are written in assembler):


unsigned int CalcKey(unsigned int seed);

Uses "seed" to start preparing the decryption key. The value returned by such function should be used somewhere, for any kind of "decryption" operation, just in order to lead the attacker astray. You may say, that sooner or later, this move would be disclosed and the attacker would get back to this point and revise it and you will be right. However, given that "real life" implementation should be more complicated then the following code, it would take a while until the real purpose is discovered. Even more than that, it would still scare away some "hackers".

The following code is the implementation of the CalcKey function used in this example:

calc_key:
   push  ebp
   mov   ebp, esp
   push  edi
   
   sub   esp, 0x14            ;this is the amount of bytes we would need 
                              ;for data reconstruction
   and   dword [ebp - 4], 0   ;forming the "key"
   dec   dword [ebp - 4]
   mov   eax, [ebp + 8]
   xor   dword [ebp - 4], eax ;by this line, the real key is half ready
   mov   eax, [ebp - 4]


   lea   edi, [ebp - 0x14]    ;go on with making the fake key
   push  eax
   mov   eax, 0x5A309FC0
   stosd
   mov   eax, 0x617CD6E7
   stosd
   mov   eax, 0x523088E7
   stosd
   mov   eax, 0x365CFAA9
   stosd
   pop   eax


   xor   eax, dword[ebp - 8]
   xor   eax, dword[ebp - 12]
   xor   eax, dword[ebp - 16]
   xor   eax, dword[ebp - 20] ;pseudo key is ready
                              ;as you see, the return value is the pseudo key
                              ;the real one remains on stack
   pop   edi
   leave
   ret

The highlighted constants, which may seem to be a part of the pseudo key calculation are in fact the data. As you can see, we put it on stack and "forget" there. It is important to mention, that you have to be careful if you decide to use stack for this purpose, and make sure the data is not being overwritten by  subsequent calls to other functions. In order to make sure this does not happen, the suggestion is to put the actual data further into the stack (e.g. at [ebp - 0x100] instead of [ebp - 0x14] or even further).

I would say it again - make use of the pseudo key somewhere.



unsigned int Mutate(unsigned int dummy);

"dummy" is a dummy parameter and my personal suggestion is to do some manipulations with it. This function may seem as the one that produces different keys derived from the pseudo key computed by CalcKey depending on the "dummy" parameter. Well, it does. But those keys are not used in this example. What it does in deed, is mutating the half generated key, which is still present on stack where we left it in the CalcKey function (if it is not - check your code), and finalizing the key generation process.

mutate:
   push  ebp
   mov   ebp, esp
   
   mov   eax, [ebp - 4]
   rol   dword[ebp - 4], 1
   xor   dword[ebp - 4], eax  ;finalize real key computation


   xor   eax, [ebp + 8]       ;make use of "dummy" parameter


   leave
   ret            

Once this function returns, we have a ready to use key somewhere in the stack space. A small note to satisfy nerds (as others should know this by default)  - you should not call these functions one after the other in real life.


unsigned char* GetPtr(void);

This is the most simple (meaning short) function. All it does - returns a pointer to the location of data inside the stack area.

get_ptr:
   push  ebp
   mov   ebp, esp
   sub   esp, 0x14
   mov   eax, esp
   leave
   ret


In case of this example, the GetPtr() function returns the pointer itself, however, you may make it return any value that allows you to form a real pointer to the real data. Another recommendation is to call this function before the data gets decrypted so that it may be considered a pointer to immediate.


void Decode(void);

Finally, the end of this "complicated" procedure - decoding the actual data with the actual key.

decode:
   push  ebp
   mov   ebp, esp


   mov   eax, [ebp - 4]      ;remember? the real key should still be here
   xor   [ebp - 8], eax      ;decode the data
   xor   [ebp - 12], eax
   xor   [ebp - 16], eax
   xor   [ebp - 20], eax


   leave
   ret
   
Upon return from this function, the pointer obtained with GetPtr() would point to the decrypted data which is still on stack. Suggestion is - move it from there and overwrite that stack area with whatever you want.

Compiling and running the attached code would print the famous "Hello, World!" string to the terminal.

Hope I managed to explain the idea and that you may find this article interesting.

See you at the next!