This is the last part of the Hijack Linux System Calls series. By now, we have created a simple loadable kernel module which registers a miscellaneous character device. This means, that we have everything we need in order to patch the system call table. Almost everything, to be honest. We still have to fill the our_ioctl function and add a couple of declarations to our source file. By the end of this article we will be able to intercept any system call in our system should there be a need for that.
System Call Table
System Call table is simply an area in the kernel memory space that contains addresses of system call handlers. Actually, a system call number is an offset into that table. This means that when we call sys_write (to be more precise - when libc calls sys_write) on a 32 bit system and passes number 4 in EAX register before int 0x80, it simply tells the kernel to go to the system call table, get the value at offset 4 from the system call table's address and call the function that address points to. It may be number 1 in RAX in case of a 64 bit system (and syscall instead of int 0x80). System call numbers are defined in arch/x86/include/asm/unistd_32.h and arch/x86/include/asm/unistd_64.h for 32 and 64 bit platforms respectively. In this article, we are going to deal with sys_open system call which is number 5 for 32 bit systems and number 2 for 64 bit systems.
Due to the fact, that modern kernels do not export the sys_call_table symbol any more, we will have to find its location in memory ourselves. There are some "hackish" ways of finding the location of the sys_call_table programmatically, but the problem is that they may work, but may not work as well. Especially the way they are written. Therefore, we are going to use the simplest and the safest way - read its location from /boot/System.map file. For simplicity reasons, we will just use grep and hardcode the address. On my computer, the command grep "sys_call_table" /boot/System.map (you should check the file name on your system, as on mine it is /boot/System.map-2.6.38-11-generic) gives this output "ffffffff816002e0 R sys_call_table". Add global variable unsigned long *sys_call_table = (unsigned long*)0xYour_Address_Of_Sys_call_table.
Preparations
We will start, as usual, by adding new includes to our code. This time, those include files are:
#include <linux/highmem.h>
#include <asm/unistd.h>
The first one is needed due to the fact that system call table is located in read only memory area in modern kernels and we will have to modify the protection attributes of the memory page containing the address of the system call that we want to intercept. The second one is self explanatory after the previous paragraph. We are not going to use hardcoded values for system calls, instead, we will use the values defined in unistd.h header.
Now we define two values, which would be used as cmd argument to our_ioctl function. One will tell us to patch the table, another one will tell us to fix it by restoring the original value.
/* IOCTL commands */
#define IOCTL_PATCH_TABLE 0x00000001
#define IOCTL_FIX_table 0x00000004
Add one more global variable int is_set=0 which will be used as flag telling whether the real (0) or custom(1) system call is in use.
It is important to save the address of the original sys_open as we are not going to fully implement our own, instead, our function will log information about the call arguments and then perform the actual (original) call. Therefore, we define a function pointer (for original call) and a function (for custom call):
/* Pointer to the original sys_open */
asmlinkage int (*real_open)(const char* __user, int, int);
/* Our replacement */
asmlinkage int custom_open(const char* __user file_name, int flags, int mode)
{
printk("interceptor: open(\"%s\", %X, %X)\n", file_name,
flags,
mode);
return real_open(file_name, flags, mode);
}
You have noticed the "asmlinkage" attribute. Well, it is, actually, a define for the attribute. We will not go that deep this time, I will just say that this attribute tells the compiler about how it should pass arguments to the function, given that it is being called from an assembly code. The "__user" macro, signifies that the argument is in user space and the function must perform certain operations to copy it to kernel space when needed. We do not need that, meaning that we may ignore it for now.
Another couple of crucial functions is the set that will allow us modify the memory page protection attributes directly. One may say that his is risky, but, in my opinion, this is less risky then actually patching the system call table as it is, first of all, architecture dependent and we know that architectures do not change drastically, second - we use kernel functions for that.
/* Make the page writable */
int make_rw(unsigned long address)
{
unsigned int level;
pte_t *pte = lookup_address(address, &level);
if(pte->pte &~ _PAGE_RW)
pte->pte |= _PAGE_RW;
return 0;
}
/* Make the page write protected */
int make_ro(unsinged long address)
{
unsigned int level;
pte_t *pte = lookup_address(address, &level);
pte->pte = pte->pte &~ _PAGE_RW;
return 0;
}
pte_t stands for typedef struct { unsigned long pte } pte_t and represents the page table entry Although, it is simply an unsigned long, it is declared as struct in order to avoid type misuse.
pte_t *lookup_address(unsigned long address, unsigned int *level) is provided by the kernel and performs all the dirty work for us and returns a pointer to the page table entry that describes the page containing the address. This function accepts the following arguments:
address - an address in virtual memory;
level - pointer to unsigned integer value which accepts the level of the mapping.
Let's Get to Business
We are almost there. The only thing left is the actual implementation of the our_ioctl function. Add the following lines:
switch(cmd)
{
case IOCTL_PATCH_TABLE:
make_rw((unsigned long)sys_call_table);
real_open = (void*)*(sys_call_table + __NR_open);
*(sys_call_table + __NR_open) = (unsigned long)custom_open;
make_ro((unsigned long)sys_call_table);
is_set=1;
break;
case IOCTL_FIX_TABLE:
make_rw((unsigned long)sys_call_table);
*(sys_call_table + __NR_open) = (unsigned long)real_open;
make_ro((unsigned long)sys_call_table);
is_set=0;
break;
default:
printk("Ooops....\n");
break;
}
And these lines to the cleanup_module function:
if(is_set)
{
make_rw((unsigned long)sys_call_table);
*(sys_call_table + __NR_open) = (unsigned long)real_open;
make_ro((unsigned long)sys_call_table);
}
Our interceptor module is ready. Well, almost ready as we need to compile it. Do that as usual - make.
Test
Finally, we have our module set and ready to use, but we have to create a "client" application, the code that will "talk" to our module and tell it what to do. Fortunately, this is much simpler then the rest of the work, that we have done here. Create a new source file and enter the following lines:
#include <stdio.h>
#include <sys/ioctl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
/* Define ioctl commands */
#define IOCTL_PATCH_TABLE 0x00000001
#define IOCTL_FIX_TABLE 0x00000004
int main(void)
{
int device = open("/dev/interceptor", O_RDWR);
ioctl(device, IOCTL_PATCH_TABLE);
sleep(5);
ioctl(device, IOCTL_FIX_TABLE);
close(device);
return 0;
}
save it as manager.c and compile it with gcc -o manager manager.c.
Load the module, run ./manager and then unload the module when manager exits. If you issue the dmesg | tail command. If you see lines containing "interceptor: open(blah blah blah)", then you know that those lines were produced by our handler.
Now we are able to intercept system calls in modern kernels despite the fact that sys_call_table is no longer exported. Although, we deal with low level structures, which normally are only used by kernel, this still is a relatively safe method as long as your module is compiled against the running kernel.
Hope this post was helpful. See you at the next one!
#include <stdio.h>
#include <sys/ioctl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
/* Define ioctl commands */
#define IOCTL_PATCH_TABLE 0x00000001
#define IOCTL_FIX_TABLE 0x00000004
int main(void)
{
int device = open("/dev/interceptor", O_RDWR);
ioctl(device, IOCTL_PATCH_TABLE);
sleep(5);
ioctl(device, IOCTL_FIX_TABLE);
close(device);
return 0;
}
save it as manager.c and compile it with gcc -o manager manager.c.
Load the module, run ./manager and then unload the module when manager exits. If you issue the dmesg | tail command. If you see lines containing "interceptor: open(blah blah blah)", then you know that those lines were produced by our handler.
Now we are able to intercept system calls in modern kernels despite the fact that sys_call_table is no longer exported. Although, we deal with low level structures, which normally are only used by kernel, this still is a relatively safe method as long as your module is compiled against the running kernel.
Hope this post was helpful. See you at the next one!
Thanks Alexey- great job you do!
ReplyDeleteThen follows some questions:
sys_call_table address is not any more exported with new kernels. I use 3.2.0. You can get address for ex. as "sudo grep sys_call_table /boot/System.map-3.2.0" But what about the future? Map is changed?
Brute force method works with __NR_... seeking whole memory. But what then when __NR_... is not exported any more?
To be more specific: to seek whole memory as "ptr[__NR_close] == (unsigned long) sys_close".
ReplyDeleteIf sys_close is not exported any more?
I know what you mean by all this. It has become a problem since the first time sys_call_table was not exported.
DeleteWhat you can do, instead of recompiling your code with each new kernel version (or each time you recompile your kernel), is to add support for write operation on the device file - this way you can write the address obtained from System.map (which you can obtain automatically by parsing the file with userspace program), or implement additional command for ioctl() (IOCTL_SET_SCT, for example) and pass the address of the sys_call_table as an additional parameter.
This "not exporting symbols" is very annoying to me.
DeleteLinux is supposed to be "free", so why not to give symbols.
Am I right, that it is something to do with "security"?
But if module is running with all privileges, security is not possible!
Anyway, I did a small demonstration showing that this policy do not help.
I did a LKM which guarantees super user rights always when asked.
Basically with very few code lines.
-----------------------
Test program (not super user rights):
int main(int argc,char *argv[])
{
setuid(0);/* Now you got su rights, if LKM loader!)*/
system("/bin/bash");
}
-------------------
Module outline:
#define ADDRESS 0x0ffffffff8107f820
/* ADDRESS (depends on the kernel configuration) got as:
>sudo grep sys_setuid /boot/System.map-3.2.0-17-generic
ffffffff8107f820 T sys_setuid
ffffffff810a27f0 T sys_setuid16
(Of course this can be made "automatic" by: Get address by awk-
pass address as module parameter....-not done here..........
...........and can be made not dependent on System.map, more complicated......)
*/
............................
ALMOST ALL NEEDED IN init_module:
............
p=(char *)(ADDRESS+0x4d);
make_rw((unsigned long long )p);
*(p) = (char)0xeb; /* Works for at least kernel 3.2-tested */...........
..........................
Must update previous. I installed Ubuntu 12.04 with kernel 3.2.0.23.
DeleteThe program:
int main(int argc,char *argv[])
{
setuid(0);/* Now you got su rights, if LKM loader!)*/
system("/bin/bash");
}
don't work anymore!(I mean EXACTLY as above).
Somebody reading this- or otherwise?
Honestly, I do not understand how your code is related to this article?
DeleteHonestly, I am sorry about inconvinience.
DeletePlease, delete my posts.
As an answer: First posting I was talking about kernel module development as you.
(You did not complain)
The second post was only because of "feeling of resposibility".
I was giving information which is wroing nowadays. If somebody tries my
code .................sorry.
No problem. There was no inconvenience. But I cannot delete your posts. Since you are anonymous, I cannot tell whether those posts are truly yours or not. So, I will leave them here.
DeleteThis comment has been removed by the author.
ReplyDeleteThanks for sharing. I went through all your three tutorials in this serie, but my source code wont compile base on those articles, multiple errors appeared. I am a newbie in linux kernel module programming. I just want to know if you can provide the single workable source code that can intercept sys_open() in the kernel above 2.6.38. for example ubuntu 12.04-kernel 3.2.0. Thank you very much.
ReplyDeleteFirst of all, sorry for the delay.
DeleteThese articles contain all the references needed to make the code run on your system. Besides - these articles are not a tutorial. This is just a demonstration. Read this, use your kernel source and it will run.
Thank you for a great tutorial on this.
ReplyDeleteI'm trying to build this for MIPS architecture. It seems that the symbol lookup_address is not available on that arch. I get:
error: implicit declaration of function 'lookup_address'
... when I try to build. On x86 it works fine. Do you have any idea how to fix this?
Hi Fredrik,
Deleteglad you found this useful.
Could you drop me a line to my private email (on the "Contact information" page and we'll go through that.
Thanks Alexey for publishing this article.
ReplyDeleteI would like to know here how sys_call is interfaced with LKM.AFAIK when we write to a device file say /dev/fpga using write() call in userspace with 3 arguments will be linked to sys_write in kernel space and which furthur is linked to LKM .Now how this linking between sys_write and LKM is maintained??
Hi Amit,
Deleteto put it simple - the kernel knows whether we are trying to open/read/write/close a file on disc or a device node. In case of device node, the creator of the LKM has to implement all the needed "IO" functionality and populate the file_operations structure with pointers to those implementations. This structure is passed to the kernel upon module loading, so the kernel simply calls your function.
Hi Alexey,
ReplyDeleteThanks to your article I was able to do something. Sorry for my ignorance beforehand but I want to ask a question.
I don't want to rebuild the kernel all again so is it possible to implement a new system call with your method? If not at least I am planning to add a new system call which literally does nothing and then intercept it with custom modules. That would still require building the kernel once but it is more convenient for debugging. What do you think, would it work?
AFAIK, there is no easy way to add a system call without recompiling the kernel. That would involve too much kernel patching. So, basically, if it is possible to recompile - do that, otherwise - dig kernel sources for system call related stuff and patch it.
DeleteP.S. Thanks for the question, it is worth to try myself and write another article ;)