Source code for this article may be found here.
There has been said and written too much on how software vendors do not protect their products, so let me skip this. Instead, in this article, I would like to concentrate on those relatively easy steps, which software vendors have to take in order to enhance their protection (using packers and protectors is good, but certainly not enough) by not letting the whole code appear in memory in readable form for a single moment.
There has been said and written too much on how software vendors do not protect their products, so let me skip this. Instead, in this article, I would like to concentrate on those relatively easy steps, which software vendors have to take in order to enhance their protection (using packers and protectors is good, but certainly not enough) by not letting the whole code appear in memory in readable form for a single moment.
Attack Vectors
Prior to dealing with "why attackers are able to x, y, z" let us map most frequent attack vectors in ascending order of their complexity.
Static Analysis - inspecting an executable in your favorite disassembler. It may be hard to believe, but majority of software products out there are vulnerable to static analysis, thus, showing us, that most of vendors do not care about proprietary algorithms' safety in addition to the fact, that they seem not to care about piracy either (but they tend to cry about it all the time).
Dynamic Analysis - running an executable inside your favorite debugger. This is a direct consequence of the previous paragraph. If an attacker is able to see the whole code in the disassembler - he/she definitely can run it in a debugger (even if this requires some minor patching).
Static Patching - this means changing the code located in the file of the executable. It may be changing one jump or adding a couple of dozens of bytes of attacker's own code in order to alter the way the program runs.
Dynamic Patching - similar to static patching in the idea behind the method. The only difference is, that dynamic patching is performed while the target executable is loaded into memory.
Dumping - saving the data in memory to a file on disk. This method may be very useful when examining a packed executable. Such memory dumps may be easily loaded into, for example, IDA and examined as if that was a regular executable (some additional actions may be required for better convenience, like rebasing the program or adjusting references to other modules).
In most cases, at least two of the aforementioned vectors would be present in time of attack.
Packers and Cryptors
Using different packers, cryptors and protectors is quite a known practice among software vendors. The problem with this is, that few of them go beyond packing the code in file and fully unpacking it in memory and, sometimes, protecting the packer itself. By saying "go beyond" I mean any implementation of anti debugging methods of any kind. Besides, such utilities do not prevent an attacker from obtaining a memory dump good enough to deal with. One or two check the consistency of the code, which may (yes - may, as it not necessarily can) prevent patching the code, but every wall has a door and it only matters how much effort opening that door may require. Bottom line is, that these types of protection may only be useful in preventing static analysis, but only if there is no relevant unpacker or decrypter.
Protectors
This is "the next step" in the evolution of packers. These provide a bit more options and tools to estimate how secure the environment is. In addition to packing the code, they also utilize code consistency checks, anti debugging tricks, license verification, etc. Protectors are good countermeasures to the first three (or even four) attack vectors. However, even if certain protector has some anti patching heuristics, it is only good as long as it (heuristics) is not reversed and either patched or fooled in any other way.
Despite all the "good" in protectors, even such powerful tools are not able to do much in order to prevent an attacker from obtaining a memory dump, which may be obtained by either using ReadProcessMemory or injecting a DLL and dumping "from inside" while suspending all other threads.
Anything Else?
Yes, there are some basic protections provided by the operating system, like session separation, for example, which prevents creation of remote threads (used with DLL injection), but those are hardly worth even mentioning here.
The picture drawn here appears to be sad and hopeless enough. However, there are several good methods to add more protection to a software product and more pain in some parts of the body to attackers.
Code Obfuscation
While this methods is widely used by protectors and, sometimes, by packers and cryptors (unfortunately, in most cases, for protecting themselves only) it seems to be almost totally unknown to the rest of software vendors. In my opinion, branching the code more than it is usually needed may not be considered as code obfuscation, it may rather be called an attempt to obfuscate an algorithm. The situation is such, that even implementation of something similar to this would be a significant improvement in vendors' efforts to protect their products.
Hiding the Code
Software vendors repeatedly fail at understanding two facts - popular means more vulnerable (in regard of commercial solutions) and the fact that there is no magic cure and they have to put some additional effort into protecting their products.
One of the options, which I would like to cover here, is dynamic encryption of executable code. This method promises that only certain parts of the code would be present in memory in readable (possible to disassemble) form, while the rest of the code (and preferably data) is encrypted.
I am still sure - the best way to explain something is explanation by example. The small piece of C code described below is intended to show the principle of dynamic code encryption. It contains several functions in addition to main - the first is the one (the target) we are going to protect. It does nothing special, just calculates the factorial of 10 and prints it out. The main function invokes a decrypter in order to decrypt the target, calls the target (thus, displaying the factorial of 10) and, finally invokes cryptor to encrypt the target back (hide it).
The code may be compiled for both Linux (using gcc) or Windows (using mingw32). It uses obfuscation code from here.
Target Function
Our target function is quite simple (it only calculates factorial for hardcoded number):
void func()
{
__asm__ __volatile__("enc_start:");
{ /* Braces are used here as we do not want IDA to track parameters */
int i, f = 10;
for( i = 9; i > 0; i--)
f *= i;
printf("10! = %d\n\n", f);
}
__asm__ __volatile__("enc_end:");
}
The code may be compiled for both Linux (using gcc) or Windows (using mingw32). It uses obfuscation code from here.
Target Function
Our target function is quite simple (it only calculates factorial for hardcoded number):
void func()
{
__asm__ __volatile__("enc_start:");
{ /* Braces are used here as we do not want IDA to track parameters */
int i, f = 10;
for( i = 9; i > 0; i--)
f *= i;
printf("10! = %d\n\n", f);
}
__asm__ __volatile__("enc_end:");
}
You noticed the labels in the beginning and in the end of the function body? These labels are only used for getting the start address of the region to be decrypted/encrypted and calculating it's length. Due to the fact that these labels are no processed by the C preprocessor, but are passed to assembler, they are accessible from other functions by default. The rest of the code is enclosed by braces in order to put all the actions related to variables i and f in the encrypted part of the function. This is what it looks like, before being decrypted:
Although, in attached code, the initial encryption is performed upon program start, in reality, it should be done with, probably, a third party tool. You would only have to put some unique marking at the start and end of the region you want to encrypt. For example:
__asm__(".byte 0x0D, 0xF0, 0xAD, 0xDE");
void func()
{
...
}
__asm__(".byte 0xAD, 0xDE, 0xAD, 0xDE");
Encryption Algorithm
Selection of encryption algorithm is totally up to you. In this particular case, the algorithm is quite primitive (it does not even require a key):
b - byte
i - position
for i = 0; i < length; i++
b(i+1) = b(i+1) xor (b(i) rol 1)
b(0) = b(0) xor (b(length) rol 1)
Execution Flow
So, let us assume that the program started with the function already encrypted. As this is just an example, we can get to the business right away:
int main()
{
unsigned int addr, len;
__asm__ __volatile__("movl $enc_start, %0\n\t"\
"movl $enc_end, %1\n\t"\
: "=r"(addr), "=r"(len));
len -= addr;
decode(addr, len);
func();
encode(addr, len);
return 0;
}
The code above is self explanatory enough. There are, however, a couple of things needed to be mentioned. decode and encode functions should take care of modifying the access rights of the memory region they are going to operate on. The following code may be used:
#ifdef WIN32
#include <windows.h>
#define SETRWX(addr, len) {\
DWORD attr;\
VirtualProtect((LPVOID)((addr) &~ 0xFFF),\
(len) + ((addr) - ((addr) &~ 0xFFF)),\
PAGE_EXECUTE_READWRITE,\
&attr);\
}
#define SETROX(addr, len) {\
DWORD attr;\
VirtualProtect((LPVOID)((addr) &~ 0xFFF),\
(len) + ((addr) - ((addr) &~ 0xFFF)),\
PAGE_EXECUTE_READ,\
&attr);\
}
#else
#include <sys/mman.h>
#define SETRWX(addr, len) mprotect((void*)((addr) &~ 0xFFF),\
(len) + ((addr) - ((addr) &~ 0xFFF)),\
PROT_READ | PROT_EXEC | PROT_WRITE)
#define SETROX(addr, len) mprotect((void*)((addr) &~ 0xFFF),\
(len) + ((addr) - ((addr) &~ 0xFFF)),\
PROT_READ | PROT_EXEC)
#endif
__asm__(".byte 0x0D, 0xF0, 0xAD, 0xDE");
void func()
{
...
}
__asm__(".byte 0xAD, 0xDE, 0xAD, 0xDE");
Encryption Algorithm
Selection of encryption algorithm is totally up to you. In this particular case, the algorithm is quite primitive (it does not even require a key):
b - byte
i - position
for i = 0; i < length; i++
b(i+1) = b(i+1) xor (b(i) rol 1)
b(0) = b(0) xor (b(length) rol 1)
Execution Flow
So, let us assume that the program started with the function already encrypted. As this is just an example, we can get to the business right away:
int main()
{
unsigned int addr, len;
__asm__ __volatile__("movl $enc_start, %0\n\t"\
"movl $enc_end, %1\n\t"\
: "=r"(addr), "=r"(len));
len -= addr;
decode(addr, len);
func();
encode(addr, len);
return 0;
}
The code above is self explanatory enough. There are, however, a couple of things needed to be mentioned. decode and encode functions should take care of modifying the access rights of the memory region they are going to operate on. The following code may be used:
#ifdef WIN32
#include <windows.h>
#define SETRWX(addr, len) {\
DWORD attr;\
VirtualProtect((LPVOID)((addr) &~ 0xFFF),\
(len) + ((addr) - ((addr) &~ 0xFFF)),\
PAGE_EXECUTE_READWRITE,\
&attr);\
}
#define SETROX(addr, len) {\
DWORD attr;\
VirtualProtect((LPVOID)((addr) &~ 0xFFF),\
(len) + ((addr) - ((addr) &~ 0xFFF)),\
PAGE_EXECUTE_READ,\
&attr);\
}
#else
#include <sys/mman.h>
#define SETRWX(addr, len) mprotect((void*)((addr) &~ 0xFFF),\
(len) + ((addr) - ((addr) &~ 0xFFF)),\
PROT_READ | PROT_EXEC | PROT_WRITE)
#define SETROX(addr, len) mprotect((void*)((addr) &~ 0xFFF),\
(len) + ((addr) - ((addr) &~ 0xFFF)),\
PROT_READ | PROT_EXEC)
#endif
This is the only platform dependent code in this sample.
Bottom Line
The example given above is really a simple one. Things would be at least a bit more complicated in real life. While there is only one encrypted function, imagine, that there are several encrypted functions. Some of them are encrypted without keys (like the one above) others require keys of different complexity. Several keys may be hardcoded (for those parts that were encrypted in order to draw attacker's attention away from the "real" thing), others should be computed on the fly.
Example:
Function A is encrypted without a key. When decrypted, it performs several operations and decrypts function B, which, in turn encrypts function A back and calculates a key for function C based on the binary content of function A (or A and B to prevent breakpoints) or even based on some other code in unrelated place.
Of course, there is no such thing as unbreakable protection. But the time it takes to break certain protection makes the difference. A company that produces software product which is cracked the next day may hardly benefit from all the hard work. On the other hand, it is totally possible to create protection schemes that would require months to be cracked.
I will try and cover additional possibilities and aspects of software protection in my future posts in a hope to at least try to change the situation.
Hope this post was helpful.
See you at the next!
Interesting article and good point you mention that "no such thing as unbreakable protection". You should probably state this up-front in bold as well because like you yourself said it'll be broken sooner or later but this is an informative article on twiddling with the binary and seems fun!
ReplyDeleteMade it bold :)
DeleteAnd yes, that is fun :)
Have you thought about platfroms enforcing W^X? Any ideas these?
ReplyDeleteI assume you are talking about architectures supporting XOR masking of instructions. Unfortunately, I am not that familiar with those, but I think that as long as you are able to change memory protection and/or allocate executable memory, nothing is impossible. In either case, you feed the CPU with instructions (or form of instructions) that it supports, so, correct me if I am wrong, this should not have a great impact.
DeleteNo, actually I meant platforms that does not allow writeable code pages. iOS for example.
DeleteI am on Android :-)
DeleteBut speaking seriously - if you can change page protection, then you may either align your code properly, so that decrypting code does not fall to the same page as the one being decrypted, or allocate a separate range of pages to contain "dynamic" code.
Indeed this is a very nice technique for anti-dumping.
DeleteIf anyone wants to play with a challenge that uses this kind of stuff, then I recommend the challenge I coded last year for Athcon security conference.
You may find the challenge here --> http://www.anti-reversing.com/2011-2/ctf-reversing-part-2011/
...and if you are just looking for some more reading then you could take a look at my documentation and the submitted solution here --> http://www.anti-reversing.com/2011-2/ctf-reversing-part-2011/authors-documentation-submitted-solution/
Regards,
Kyriakos Economou (@kyREcon)
I never thought about utilizing something like this. It's a good idea, but how would you implement something like this in a threaded application where the code to be protected is concurrent with multiple threads? Something like a network server application. The only thing that I can think of is to encrypt and decrypt on the fly by using locks. Furthermore, what about the performance considerations of such a setup?
ReplyDeleteSomething else to consider... Use this, with code obfuscation, and a VM.
VM's basics is covered in another article on this blog.
DeleteCode obfuscation is great but almost useless when a dynamic approach is in use.
As to a threaded application, you may use mutex semantics to get a workaround, but, generally, you would either have to use a different protection mechanism or simply not put a code that needs such protection in a code of a thread function.
Очень интересно! Как то давно делал такую защите на делфи, без SETRWX только...
ReplyDeleteСейчас учу C и попробую применить )
Спасибо большое!
Рад, что инфа пригодилась. Рад был помочь.
DeleteHi there. I took the code as is and included it in a Win32 Visual Studio 2010 empty project. I get a bunch of errors.
ReplyDeleteIt cry about __asm__ and __volatile__ not being defined. I replaced them with __asm and __volatile, but I still got errors that were present initially.
ASM( "enc_start:\n\t"); give error:
inline assembler syntax error in 'opcode'; found '('
Can you provide any help?
Thank you!
This code will not compile under MS Visual Studio (any version of it) as it was not written for it. In order to compile the code you have to either change it (so that it would conform VS) or use mingw32.
Delete