Thursday, October 13, 2011

Dennis Ritchie, R.I.P.

Dennis Ritchie, the father of C language and co-creator of Unix has passed away this week...

Hijack Linux System Calls: Part III. System Call Table

This is the last part of the Hijack Linux System Calls series. By now, we have created a simple loadable kernel module which registers a miscellaneous character device. This means, that we have everything we need in order to patch the system call table. Almost everything, to be honest. We still have to fill the our_ioctl function and add a couple of declarations to our source file. By the end of this article we will be able to intercept any system call in our system should there be a need for that.

System Call Table
System Call table is simply an area in the kernel memory space that contains addresses of system call handlers. Actually, a system call number is an offset into that table. This means that when we call sys_write (to be more precise - when libc calls sys_write) on a 32 bit system and passes number 4 in EAX register before int 0x80, it simply tells the kernel to go to the system call table, get the value at offset 4 from the system call table's address and call the function that address points to. It may be number 1 in RAX in case of a 64 bit system (and syscall instead of int 0x80). System call numbers are defined in arch/x86/include/asm/unistd_32.h and arch/x86/include/asm/unistd_64.h for 32 and 64 bit platforms respectively. In this article, we are going to deal with sys_open system call which is number 5 for 32 bit systems and number 2 for 64 bit systems.

Due to the fact, that modern kernels do not export the sys_call_table symbol any more, we will have to find its location in memory ourselves. There are some "hackish" ways of finding the location of the sys_call_table programmatically, but the problem is that they may work, but may not work as well. Especially the way they are written. Therefore, we are going to use the simplest and the safest way - read its location from /boot/ file. For simplicity reasons, we will just use grep and hardcode the address. On my computer, the command grep "sys_call_table" /boot/ (you should check the file name on your system, as on mine it is /boot/ gives this output "ffffffff816002e0 R sys_call_table". Add global variable unsigned long *sys_call_table = (unsigned long*)0xYour_Address_Of_Sys_call_table.

We will start, as usual, by adding new includes to our code. This time, those include files are:

   #include <linux/highmem.h>
   #include <asm/unistd.h>

The first one is needed due to the fact that system call table is located in read only memory area in modern kernels and we will have to modify the protection attributes of the memory page containing the address of the system call that we want to intercept. The second one is self explanatory after the previous paragraph. We are not going to use hardcoded values for system calls, instead, we will use the values defined in unistd.h header.

Now we define two values, which would be used as cmd argument to our_ioctl function. One will tell us to patch the table, another one will tell us to fix it by restoring the original value.

   /* IOCTL commands */
   #define IOCTL_PATCH_TABLE 0x00000001
   #define IOCTL_FIX_table   0x00000004

Add one more global variable int is_set=0 which will be used as flag telling whether the real (0) or custom(1) system call is in use.

It is important to save the address of the original sys_open as we are not going to fully implement our own, instead, our function will log information about the call arguments and then perform the actual (original) call. Therefore, we define a function pointer (for original call) and a function (for custom call):

   /* Pointer to the original sys_open */
   asmlinkage int (*real_open)(const char* __user, int, int);
   /* Our replacement */
   asmlinkage int custom_open(const char* __user file_name, int flags, int mode)
      printk("interceptor: open(\"%s\", %X, %X)\n", file_name,
      return real_open(file_name, flags, mode);

You have noticed the "asmlinkage" attribute. Well, it is, actually, a define for the attribute. We will not go that deep this time, I will just say that this attribute tells the compiler about how it should pass arguments to the function, given that it is being called from an assembly code. The "__user" macro, signifies that the argument is in user space and the function must perform certain operations to copy it to kernel space when needed. We do not need that, meaning that we may ignore it for now.

Another couple of crucial functions is the set that will allow us modify the memory page protection attributes directly. One may say that his is risky, but, in my opinion, this is less risky then actually patching the system call table as it is, first of all, architecture dependent  and we know that architectures do not change drastically, second - we use kernel functions for that.

   /* Make the page writable */
   int make_rw(unsigned long address)
      unsigned int level;
      pte_t *pte = lookup_address(address, &level);
      if(pte->pte &~ _PAGE_RW)
         pte->pte |= _PAGE_RW;
      return 0;

   /* Make the page write protected */
   int make_ro(unsinged long address)
      unsigned int level;
      pte_t *pte = lookup_address(address, &level);
      pte->pte = pte->pte &~ _PAGE_RW;
      return 0;

pte_t stands for typedef struct { unsigned long pte } pte_t and represents the page table entry Although, it is simply an unsigned long, it is declared as struct in order to avoid type misuse.

pte_t *lookup_address(unsigned long address, unsigned int *level) is provided by the kernel and performs all the dirty work for us and returns a pointer to the page table entry that describes the page containing the address. This function accepts the following arguments:

address - an address in virtual memory;
level - pointer to unsigned integer value which accepts the level of the mapping.

Let's Get to Business
We are almost there. The only thing left is the actual implementation of the our_ioctl function. Add the following lines:

         make_rw((unsigned long)sys_call_table);
         real_open = (void*)*(sys_call_table + __NR_open);
         *(sys_call_table + __NR_open) = (unsigned long)custom_open;
         make_ro((unsigned long)sys_call_table);
      case IOCTL_FIX_TABLE:
         make_rw((unsigned long)sys_call_table);
         *(sys_call_table + __NR_open) = (unsigned long)real_open;
         make_ro((unsigned long)sys_call_table);

And these lines to the cleanup_module function:

      make_rw((unsigned long)sys_call_table);
      *(sys_call_table + __NR_open) = (unsigned long)real_open;
      make_ro((unsigned long)sys_call_table);

Our interceptor module is ready. Well, almost ready as we need to compile it. Do that as usual - make.

Finally, we have our module set and ready to use, but we have to create a "client" application, the code that will "talk" to our module and tell it what to do. Fortunately, this is much simpler then the rest of the work, that we have done here. Create a new source file and enter the following lines:

   #include <stdio.h>
   #include <sys/ioctl.h>
   #include <sys/types.h>
   #include <sys/stat.h>
   #include <fcntl.h>

   /* Define ioctl commands */
   #define IOCTL_PATCH_TABLE 0x00000001
   #define IOCTL_FIX_TABLE   0x00000004

   int main(void)
      int device = open("/dev/interceptor", O_RDWR);
      ioctl(device, IOCTL_PATCH_TABLE);
      ioctl(device, IOCTL_FIX_TABLE);
      return 0;

save it as manager.c and compile it with gcc -o manager manager.c

Load the module, run ./manager and then unload the module when manager exits. If you issue the dmesg | tail command. If you see lines containing "interceptor: open(blah blah blah)", then you know that those lines were produced by our handler.

Now we are able to intercept system calls in modern kernels despite the fact that sys_call_table is no longer exported. Although, we deal with low level structures, which normally are only used by kernel, this still is a relatively safe method as long as your module is compiled against the running kernel.

Hope this post was helpful. See you at the next one!

Wednesday, October 12, 2011

Hijack Linux System Calls: Part II. Miscellaneous Character Drivers

We all know what device drivers are - the hands of the operating system that make it possible for the kernel to handle hardware.  We also know that there are two types of devices  - character and block, depending on the way they handle data transmissions, but what does "miscellaneous" device mean? To put it simple - it means what it means. On one hand, this may be a driver that handles simple hardware, on the other hand, it is the way Linux allows us to create virtual devices, as one of the ways to communicate with kernel modules, which is exactly what we need in order to hijack Linux System Calls. 

In this section, we will create a simple virtual character device, which will be used by a user space process to instruct our kernel module whether it should hijack or restore certain system call. This virtual device will be controlled with ioctl function. For simplicity, I decided not to add read/write handlers to this device as it is not really required for what we are about to do. Although, it is a "nice to have" feature.

Miscellaneous devices are represented with the struct miscdevice which is declared in /include/linux/miscdevice.h as

   struct miscdevice
      int minor;
      const char *name;
      const struct file_operations *fops;
      struct list_head list;
      struct device *parent;
      struct device *this_device;
      const char *nodename;
      mode_t mode;

Quite a big one, ha? However, we only should take care of the first three members of the structure:

minor stands for the minor number of the device. It is preferred to set it to
              MISC_DYNAMIC_MINOR (at least in for our module) unless you need some specific
              number to be used.
name  this is the name of our device as it should appear in the /dev filesystem
struct file_operations is a set of pointers to corresponding implementations of IO
              functions and a pointer to the owner module. 
              This structure is too big to be presented here, but you may find it in 

So, first of all, add another include file to your code (which we've written in previous article) with

   #include <linux/miscdevice.h>

Then, we should add custom handlers for functions and global variables we are interested in, namely - open, release, ioctl and variable in_use.

   /* We will set this variable to 1 in our open handler and erset it
       back to zero in release handler*/
   int in_use = 0;

   /* This function will be invoked each time a user process attempts
       to open our device. You should keep in mind that the prototype
      of this function may change along different kernel versions. */
   static int our_open(struct inode *inode, struct file *file)
      /* We would not like to allow multiple processes to open this device */
         return -EBUSY;
      printk("device has been opened\n");
      return 0;

   /* This function, in turn, will be called when a process closes our device */
   static int our_release(struct inode *inode, struct file *file)
      printk("device has been closed\n");
      return 0;

   /* This function will handle ioctl calls performed on our device */
   static int our_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
      int retval = 0;
      /* We will fill this function in the Part III of this series */

      return retval;

Now it's time to create the struct file_operations and struct miscdevice and populate the relevant fields:

   static const struct file_operations our_fops =\
      .owner = THIS_MODULE,
      .open = &our_open,
      .release = &our_release,
      .unlocked_ioctl = (void*)&our_ioctl,
      .compat_ioctl = (void*)&our_ioctl

A reasonable question would be "Why do we set unlocked_ioctl and compat_ioctl with the same value and where the heck is the regular ioctl?". There is nothing special about this. unlocked_ioctl is used on 64 bit platforms and compat_ioctl in 32 bit or in compatibility mode and it is totally normal to make them point at the same location as long as you handler function does not mess the types up. As to ioctl, it is simply not there any more...

   static struct miscdevice our_device = \

After all this, we should make a small adjustment to our init_module function by inserting the following code:

   int retval = misc_register(&our_device);

You should also change "return 0;" to "return retval;". The code above tells the system to register a miscellaneous device described by the miscdevice structure with the kernel. In case the minor field is assigned MISC_DYNAMIC_MINOR (our case exactly), kernel fills it with random (from our point of view) number, otherwise, the requested minor number is used.

Our cleanup_module function should have this line added:


in order to unregister our device and remove it from the system.

Important note: functions exported by kernel return 0 upon success or negative error code in case of failure.

By now we have a working kernel module which registers a miscellaneous device when loaded and unregisters it when unloaded. Build it now with the make command. Load it with insmod and check the content of the /dev file system. You will see that there is a new device called "interceptor" with number 10 as major number and what ever has been assigned as minor number. You may unload it now. 

If you wish, you may try to open and close the /dev/interceptor device from a user process and check the log with dmesg | tail. You will see the lines "device has been opened" and "device has been closed" respectively. You may also try to open the device from two user processes simultaneously, then you will see that only one process may successfully open it.

In the next section we are going to add some code to our module, which would make it possible to actually patch the sys_call_table and replace original calls with custom wrappers.

Hope this post was helpful. See you at the next one!

Hijack Linux System Calls: Part I. Modules

Have you ever tried to google for "patching Linux system call table"? There are hundreds, if not thousands, of posts regarding this problem. Most of them are outdated, as they refer to older kernels (those, that still exported sys_call_table), others are about adding custom system call and recompiling the kernel. There are a few covering modern kernels, but those are brief and, mostly, only give you a general idea of how it works. I decided to make an in-depth description of the procedure and provide a working example.

This description consists of three parts: Modules, Miscellaneous Character Drivers and System Call Table. Although, the second part is optional, but it would make your kernel space code more flexible and usable from user space perspective.

Modules - this part immediately follows this preamble and covers the basics of Loadable Kernel Modules.
Miscellaneous Character Drivers - here I try to provide you with in-depth explanation of what it is, how it works and, the most important - how it may be used.
System Call Table - This is going to be the shortest part as it only covers the structure of the system call table.

So, let us dive into the business.

Loadable Kernel Modules
Loadable Kernel Modules (aka LKMs) are simply extensions to the basic kernel in your operating system that may be loaded/unloaded on the fly, without a need to recompile the kernel or reboot the system. This feature exists in all major operating systems (Windows, Linux, MacOS) but we will concentrate on Linux only, preferably having kernel 2.6.38 (as this is the one I tested examples on) and above.

Modules may be utilized for different purposes, like adding support for new hardware, adding new system call or extending the kernel functionality in any other way. We are going to use kernel module for kernel patching. Why kernel module? The answer is simple - we cannot modify anything inside kernel space from a user process and we will have to perform a decent set of modifications. But first of all, we need to write our module. 

Go on, open your favorite source editor and start with adding the needed include files:

   #include <linux/version.h>
   #include <linux/module.h>

Basically, these two are all you need in order to build a simple kernel module that may be loaded and unloaded. We will, however, add some mode includes later on. 

There are two more things that we unconditionally need to implement - initialization and cleanup routines. Here we go:

   static int __init init_module(void) /* You may use any name other than
                                          init_module */
      /* your initialization code goes here */
      /* Once you are done, return 0 to tell the OS that your module
         has been loaded successfully or return relevant error code
         (which must be a negative integer) */
      printk(KERN_INFO "We are in kernel space\n");
      return 0;

This is our initialization routine. If you have to set up any variables or make another arrangements which are crucial for your module, you should do that here.

   static void __exit cleanup_module(void) /* Same here - you may use
                                              any name instead of
                                              cleanup_module */
      printk(KERN_INFO "Elvis has left the building\n");

On the other hand, the routine above is used to clean the mess we produced with our module. It is called before the module is unloaded.

As you have noticed, we used printk function here. One of the most robust functions exported by Linux kernel, generally used to output log/diagnostic messages.  You may obtain its output by issuing the dmesg command.

Use module_init and module_exit macros to outline these routines:


And the last thing - add some more information to the module using the following macros:

   /* Beware, that some of  kernel functions may not be available 
      to your code if you use license other then GPL */
   /* Your name and email goes here */
   MODULE_AUTHOR("your name goes here");
   /* Version of your module */
   MODULE_VERSION("this string is up to you");
   /* Write a line about what this module is */
   MODULE_DESCRIPTION("describe this module here");

That's it. We have just built a skeleton kernel module.  Now we have to compile it. But the problem is that you cannot compile kernel modules as you would do with your applications. Simply because they are not simple applications. You have to create a special makefile, pointing out that it is a makefile for a kernel module:

#obj stands for object
#m stands for module/driver
#this is the list of modules that the kernel building system
#needs to build
   obj-m := name_of_the_module.o
#Kernel building system (include files mostly)
#uname -r gives the version of the running kernel
   KDIR := /lib/modules/`uname -r`/build
#current working directory - where to store the output
   PWD := `pwd`
#default build rule
   make -C $(KDIR) M=$(PWD) modules

Run make and you will have the compiled module ready to be plugged in. All you need to do now is attempt to load the module into the kernel and we have the insmod command for that purpose. Use one of the following, depending on whether your distro is debian or red hat based, respectively:

sudo insmod ./your_module_name.ko


su -c "insmod ./your_module_name.ko"

You will be prompted for your or root's password (depends on the issued command), after that you should get the shell prompt. If you get no error notifications, that means that your module has been successfully loaded, otherwise, as usual, check your code and verify it against your kernel sources. Use the following to unload your module:

sudo rmmod your_module_name


su -c "rmmod your_module_name"

you have no need for the ".ko" extension here. If you get no error notification - all's good. But if you do, that would mean, that, for some reasons, the system is unable to unload your module. The most frequent message that I used to get is about my module being busy. Sometimes, even rmmod -f cannot resolve this issue and you have to reboot your machine in order to get your module out of kernel's hands. Of course, you have to check your code after that for possible reasons.

Now type "dmesg | tail" to get the end of the kernel log. The last two lines should contain the strings we passed to printk function or the reason for not being able to unload the module. It is important to run this command before you reboot in case of an error in order to see the reason, otherwise, you will not find that entry.

So, we've just built our simplest module. Actually, this is not true, as the simplest module should not contain printk :). We are (hopefully) able to load and unload it.

In the next article, we will add a virtual device, that would be responsible for interaction with a user process and will perform all the patching/fixing operations.

Hope this post was helpful. See you in the next one!

Sunday, October 9, 2011

Interfacing Linux Signals

NOTE: All information provided here is related to x86 and IA64 and may be incorrect in regard of other platforms. More than that, it may not be the same on every x86/IA64, so check your kernel/libc sources first.
All source file paths are relative to your Linux Kernel source directory (most probably "/usr/src/linux") unless it is mentioned otherwise.

Sample code for this article may be downloaded from here.

Internet is full of information on Linux signals and usage thereof, starting with simple "signal(SIGSEGV, foo)" examples through more complicated tutorials. The purpose of this article is to show the way your applications interface with Linux kernel when it comes to signal handling.

First of all, what are signals? For windows guys (welcome!) reading this post and for those who has not yet reached this topic while mastering Linux programming - signals are exception notifications sent to your application by the underlying operating system (list of signals may be found in include/asm-generic/signal.h ). There are several options for what your application should do upon signal reception: ignore/block, pass to default handler, pass to custom handler. It is important, however, to mention that SIGSTOP and SIGKILL can neither be blocked nor handled with custom handler. We will concentrate on custom signal handlers in this article.

Good old signal()
This call is deprecated, although, it is still available on 32 bit systems (__NR_signal = 0x30), on 64 bit systems it is just a libc replacement. It accepts two parameters:
  • Number of the signal (as defined in signal.h) in EBX;
  • address of the custom handler in ECX;
EAX should contain the __NR_signal value before int 0x80 (invocation of a system call in 32 bit Linux kernels). This call returns either a previous value for signal handler or SIG_ERR (0xFFFFFFFF) on error.

It is hard enough to correct errors while using this system call as it only provides you with a signal number leaving you clueless of what has caused an exception.

The Mighty Sigaction
We have a much more powerful mechanism for signal handling nowadays. The name of this mighty mechanism is sigaction and it is described in libc's signal.h as this:

int sigaction(int signum, const struct sigaction *act, struct sigaction *oldact);

This call allows us to set custom signal handler in two ways. The first way is to set a "good old" simple handler which would only receive the number of the signal, the other is to ask the system to provide us with extended information about the signal and about the state of the process at the time of exception by specifying SA_SIGINFO flag in the struct sigaction's sa_flags field. 

Sigaction (2) manual page states that "sigaction structure is defined as something like:"

struct sigaction
   void      (*sa_handler)(int); /* for the simple handler */
   void      (*sa_sigaction)(int, siginfo_t *, void *); /* for the handler that\
                                                           requires extended information */
   sigset_t  sa_mask;
   int       sa_flags; /* This field would hold the SA_SIGINFO flag if needed */
   void      (*sa_restorer)(void);

Why "something like"? The answer is simple. On many architectures there is a union instead of two fields (sa_handler and sa_sigaction) and we are advised not to define both. As to the sa_restorer field - there is less information about it in the internet, then about me, though, we will cover this field a bit later.

Low Level Implementation of Sigaction
It seems to me that raw explanation of what and how would be too boring and may cut off a major part of the audience, therefore, I think, that the best explanation is done by example. In this particular case, the example is a small program written in Assembly language for bot 32 and 64 bit Linux and compiled using Flat Assembler. It has some encrypted code in its code section and encrypted string in data section. The program uses custom handler for SIGSEGV in order to decode those parts and encode them again when they are no longer needed. This technique involves another mighty system call - mprotect. But before we start - forget everything that has been said about structures till now, as things become different the closer we get to the kernel.

Sigaction a la 32 bit
First of all, we need to do some preparations and define several constants and structures which we are going to use in our code. Let's start with memory protection flags for the mprotect call:

PROT_READ  = 0x00000001   ;Page may be accessed for reading
PROT_WRITE = 0x00000002   ;Page may be accessed for writing
PROT_EXEC  = 0x00000004   ;Page's content may be executed as code

Flags for the sigaction structure's sa_flags field:

SA_SIGINFO = 0x00000004   ;We need more info about the signal

Our signal number (for memory access violation):


And finally we have to define  the system calls that will be used:

__NR_exit      = 0x01
__NR_write     = 0x04
__NR_sigaction = 0x43
__NR_mprotect  = 0x7D

But before we may start coding, we have to prepare several structures. Of course, we will start with the sigaction32 structure defined in arch/x86/include/asm/ia32.h:

struc sigaction32
   ;Field        size      offset
   .sa_handler   dd   ?    ;0x00
   .sa_restorer  dd   ?    ;0x04
   .sa_flags     dd   ?    ;0x08
   .sa_mask      dd   ?    ;0x0C

sa_handler will contain the address of our custom handler, sa_flags will be assigned the value of SA_SIGINFO as we do need extended information about the signal and the state of the process at the time of the exception. We will ignore the sa_mask field for now.

sa_restorer cannot be ignored any more as it pops up each time we mention sigaction. As I have already mentioned above, this field is deprecated and should not be defined/used on newer platforms. More than that POSIX is not aware of this field at all. It used to contain the address of the procedure that would restore user context once we are done with handling the signal (typically, this procedure would simply invoke __NR_sigreturn system call). However, this operation is performed automatically by the VDSO.

Basically, we are set to start. So, as a first step, we need to initialize the sigaction32 structure which resides in our data section and is named sa :

   ;Setup sigaction handler
   mov [sa.sa_handler], _handler
   ;flags - we use SA_SIGINFO
   mov [sa.sa_flags], SA_SIGINFO
   ;set system call number
   mov eax, __NR_sigaction
   ;set the number of signal we want to handle
   mov ebx, SIGSEGV
   ;set the address of our sigaction structure
   mov ecx, sa
   ;EDX should contain the address of sigaction
   ;structure to store previous action. We have none
   ;so we pass NULL as last argument
   xor edxedx
   int 0x80

EAX register should contain 0, otherwise - as usual, check your code. We have just set our custom handler (pointed by _handler) for SIGSEGV . We will now cause a segmentation fault in order to decode the encrypted code we have in our program:

   mov eax, .hidden_code
.reason1:  ;Address of the first segfault
   xor dword [eax], 0x8BADF00D

This is followed by our encrypted code. You may use whatever encryption algorithm you want, I personally used simple XOR:

   ;print the "It works!" string
   mov eax__NR_write
   mov ebx, 1
   mov ecx, str_it_works
   mov edx, str_it_works_length
   int 0x80

   ;Cause segmentation fault in order to encode this block
   mov eax, .hidden_code
.reason2:  ;Address of the second segfault
   xor dword [eax], 0x8BADF00D
.hidden_code_length = $ - .hidden_code

   ;End of our program
   mov eax__NR_exit
   xor ebxebx
   int 0x80

;Macro for encryption of hidden_code
repeat .hidden_code_length
   load a byte from .hidden_code+%-1
   store byte a xor 0xE5 at .hidden_code+%-1
end repeat

Once we reach the line labeled as ".reason1", we actually reach the place in code that causes segmentation fault. This brings us to our custom handler _handler

   ;Usual prologue 
   push ebp
   mov ebp,esp
   ;We will use some registers, so let's save them
   push eax ebx ecx edx

By now, we have the signum parameter at [ebp+8], pointer to siginfo structure at [ebp+12] and pointer to the ucontext_ia32 structure at [ebp+16]. Let's take a short break from coding and concentrate on those structures.

struc siginfo
   .si_signo    dd  ?
   .si_errno    dd  ?
   .si_code     dd  ?
   ;The rest of this structure may be a union and depends on signal

.si_signo - signal number;
.si_errno - an errno value, generally unused on Linux;
.si_code - signal code, which gives more information regarding the source of the signal.
Check include/asm-generic/siginfo.h for detailed layout specs for each signal. Generally speaking, this structure is designed to give exact idea of what has happened.

Next structure in the row of handler's arguments is the ucontext_ia32. This is a snapshot of the CPU at the time of signal reception and is defined in arch/x86/include/asm/ia32.h

struc ucontext_ia32
   ;Field       size                  offset
   .uc_flags    dd                 ?  ;0x00
   .uc_link     dd                 ?  ;0x04
   .uc_stack    sigaltstack_ia32      ;0x08
   .uc_mcontext sigcontext_ia32       ;0x14
   .uc_sigmask  dd                 ?  ;0x6C

Struct sigaltstack_ia32 is actually a definition of type stack_ia32_t in the same header file. It describes alternative stack for signal handler if such exists. We make no use of this field as we use the same stack as the main process. Here is it's definition in our example program

struc sigaltstack_ia32
   .ss_sp    dd ?
   .ss_flags dd ?
   .ss_size  dd ?

But the structure we are particularly interested in is the sigcontext_ia32 and may be found at offset 0x14 in the ucontext_ia32:

;defined int arch/x86/include/asm/sigcontext32.h
struc sigcontext_ia32
   ;Field     size    offset
   .gs        dw      ;0x00
   .__gsh     dw      ;0x02
   .fs        dw      ;0x04
   .__fsh     dw      ;0x06
   .es        dw      ;0x08
   .__esh     dw      ;0x0A
   .ds        dw      ;0x0C
   .__dsh     dw      ;0x0E
   .edi       dd      ;0x10
   .esi       dd      ;0x14
   .ebp       dd      ;0x18
   .esp       dd      ;0x1C
   .ebx       dd      ;0x20
   .edx       dd      ;0x24
   .ecx       dd      ;0x28
   .eax       dd      ;0x2C
   .trapno    dd      ;0x30
   .err       dd      ;0x34
   .eip       dd      ;0x38
   .cs        dw      ;0x3C 
   .__csh     dw      ;0x3E
   .flags     dd      ;0x40    EFLAGS
   .sp_at_signal       dd      ;0x44
   .ss        dw      ;0x48
   .__ssh     dw      ;0x4A
   .fpstate   dd      ;0x4C
   .oldmask   dd      ;0x50
   .cr2       dd      ;0x54

This structure is quite self-explanatory except, may be the .fpstate field. Those familiar with Windows are regular to structure named FLOATING_SAVE_AREA which is embedded into the CONTEXT structure, however, in Linux this structure is stored separately and .fpstate only contains its address.

As it has been mentioned above, this structure represents the CPU snapshot and, what is especially good about it, it is writable, meaning that we may alter contents of CPU register in the context structure.  This means that once we are done with out handler (unless it terminates the process) this structure, with all the modified values will be used to restore the CPU state.

The Handler
It finally happened! We have finally got to the handler function. This part has very little to do with signals and is used to dilute the tones of raw information with something (hopefully) interesting.

In this function we are going to decode the encrypted executable code and data. First if all, we need to get the address of the encrypted code in such a way that it would be good for the mprotect function which requires page aligned addresses. Our program is small enough to assume that the handler and the encrypted code are within the same page (I actually checked it). Thus, we first get the current EIP:

   call .get_eip
   ;EBX register will be used throughout this function to hold the address of the current page
   pop ebx

   ;Make it page aligned in order to use with mprotect
   and bx, 0xF000

   ;Get the length of the region to change access permissions
   mov ecx, _start.hidden_code + _start.hidden_code_length
   sub ecx, ebx

   ;Load new protection flags
   mov eax, __NR_mprotect
   ;Call mprotect
   int 0x80

   ;We have to check the result returned by mprotect as 
   ; in case of error we would not be able to proceed
   or eax, 0
   ;If error is returned, then we simply terminate the process
   jnz _start.finish

You should have mentioned the labels .reason1 and .reason2 earlier. They are commented as the first and the second segfaults. We are going to use them as action indicators for our signal handler - decode encrypted code and data if signal has been raised by memory access violation at .reason1 or encode them back if the violation occurred at .reason2. The next step is to check what should we do:

   ;Get the address of the ucontext_ia32 structure (the third parameter)
   mov eax, [ebp+16]

   ;The following is FASM specific macro which makes it possible to use symbolic names
   ; instead of register base + offset
   virtual at eax
      .context  ucontext_ia32
   end virtual

   ;We check for reason by comparing the value of the EIP register in ucontext_ia32
   ;against the address of .reason1
   cmp dword[.context.uc_mcontext.eip], _start.reason1
   ;I let myself assume, that it is address of .reason2 if the above 
   ;expression is not true (i.e. does not set the zero flag). But you should check
   ;that in order to avoid errors
   jnz ._reason2

So, we have found that the value of the EIP register from the ucontext_ia32 structure equals to _start.reason1, in this case, save the EAX and EBX registers on stack and perform all operations needed to decode the encrypted parts of your program.

Content of the _start function before decryption of the .hidden_code

Content of the _start function after the .hidden_code has been decrypted

Once we have decoded all the encrypted parts, we need to modify the value of the EIP register in the ucontext_ia32, so that it would point to .hidden_code

   mov dword [.context.uc_mcontext.eip], _start.hidden_code

We are almost done with the handler, however, there is one little thing to be done - we have to write protect our code section again. The EBX register should still point to the beginning of the page which contains our code section (unless you forgot to back it up), so all we have to do is the following:

   ;Get the size of the region
   mov ecx, _start.hidden_code + start.hidden_code_length
   sub ecx, ebx
   mov eax, __NR_mprotect
   mov edx, PROT_READ or PROT_EXEC
   int 0x80

Now we may restore the stack and ret from the handler.

Due to the fact that we modified EIP in the ucontext_ia32 structure, our program will continue from .hidden_code instead of trying to execute the code at .reason1. As it has been deciphered by our signal handler, it will do what it was designed to do, namely - output a string (in our case it is "It works!"). The last operation performed by .hidden_code is attempt to write to code section (which we made write protected), this, in turn, will cause another segmentation fault, but this time, the EIP in the ucontext_ia32 structure will contain the address of .reason2 and our handler will encode the .hidden_code and the string and set EIP to point to _start.finish.

At the end, if we try to run our program, we get the following output:

By now, we know how to interact with the sigaction system call on the lowest level (you say whether it is really needed ;-) ) on 32 bit Linux (PC).

Sigaction on 64 bit. Is it much different?
If we try to implement the above example on 64 bit Intel platform, algorithmically, it would be the same. We would use almost the same system calls and almost the same flags. I will not cover the whole process here, instead, let us concentrate on major differences.

First of all system call numbers are different:

__NR_exit         = 0x3C
__NR_write        = 0x01
__NR_rt_sigaction = 0x0D
__NR_mprotect     = 0x0A
__NR_rt_sigreturn = 0x0F

Another difference is that there is no such thing as sys_sigaction on 64 bit platform. This does not mean that there is no such option at all. We will use sys_rt_sigaction system call which slightly differs from the old good sys_sigaction.  The main difference is that we have to specify one additional parameter - the size of sa_mask in 64 bit words. Here are the same structures that we used in 32 bit example adapted for 64 bits. All structure definitions are taken from arch/x86/include/asm/sigcontext.h except struct sigaction, which was taken from libc sources (it is called struct kernel_sigaction there).

struc sigaction
   .__sigaction_handler  dq ?     ;Address of the handler
   .sa_flags             dq ?     ;In this example we set this field
                                  ;to "SA_SIGINFO or 
   .sa_restorer          dq ?     ;Either my platform is not as 
                                  ;new as I thought,
                                  ;but I had to specify restorer
                                  ;procedure too.
                                  ;You may try both and see whether rt_sigaction
                                  ;returns error. If this is the case, add
                                  ;restorer procedure like this
                                  ;   mov eax, __NR_rt_sigreturn
                                  ;   syscall
   .__val                sigset_t ;Mask

This is how you setup this structure:

   mov r10, _NSIG_WORDS                   ;Size of mask in 64 bit words
   mov [sa.__sigaction_handler], _handler ;Address of our custom handler
   mov [sa.sa_flags], SA_SIGINFO or SA_RESTORER
   mov [sa.sa_restorer], _restorer        ;Address of the restorer procedure
   xor rdx, rdx                           ;We do not specify the oldact
   mov rsi, sa                            ;Load the address of our sigaction structure
   mov rdi, SIGSEGV                       ;Set signal number
   mov rax, __NR_rt_sigaction

The _NSIG_WORDS constant is calculated this way:

_NSIG       = 64
_NSIG_BPW   = 8

The rest of the structures are:

struc sigcontext
   ;Field     size   offset
   .r8         dq ?  ;0x00
   .r9         dq ?  ;0x08
   .r10        dq ?  ;0x10
   .r11        dq ?  ;0x18
   .r12        dq ?  ;0x20
   .r13        dq ?  ;0x28
   .r14        dq ?  ;0x30
   .r15        dq ?  ;0x38
   .rdi        dq ?  ;0x40
   .rsi        dq ?  ;0x48
   .rbp        dq ?  ;0x50
   .rbx        dq ?  ;0x58
   .rdx        dq ?  ;0x60
   .rax        dq ?  ;0x68
   .rcx        dq ?  ;0x70
   .rsp        dq ?  ;0x78
   .rip        dq ?  ;0x80
   .rflags     dq ?  ;0x88
   .cs         dw ?  ;0x90
   .gs         dw ?  ;0x92
   .fs         dw ?  ;0x94
   .__pad0     dw ?  ;0x96
   .err        dq ?  ;0x98
   .trapno     dq ?  ;0xA0
   .oldmask    dq ?  ;0xA8
   .cr2        dq ?  ;0xB0
   .fpstate    dq ?  ;0xB8
   .reserved   rq 8  ;0xC0

struc sigaltstack
   .ss_sp      dq ?
   .ss_flags   dq ?
   .ss_size    dq ?

struc sigset_t
   repeat _NSIG_WORDS
      dq ?
   end repeat

struc ucontext
   .uc_flags    dq ?
   .uc_link     dq ?
   .uc_stack    sigaltstack
   .uc_mcontext sigcontext
   .uc_sigmask  sigset_t

When it comes to structures, the most visible difference (in addition to some other fields in sigcontext) is the size - almost all is 64 bits instead of 32. All the rest is very similar to what we have done in the previous section.

There is almost no difference between the 32 and 64 bit handlers except the way the later receives its parameters as they are passed via registers instead of being pushed on stack. In this case we have:

   RDI = signum;
   RSI = siginfo_t*;
   RDX = sigcontext*;

This is according to AMD64 ABI, but this is beyond the scope of this article.

Hope this post was helpful. See you at the next one!