Source code for this article may be found here.
Sometimes, a need may rise to start a thread in a separate process and the need is not necessarily malicious. For example, one may want to replace library functions or to place some code between the executable and a library function. However, Linux does not provide a system call that would do anything similar to CreateRemoteThread Windows API despite the fact that I see people searching for such functionality. You may google for "CreateRemoteThread equivalent in Linux" yourself and see that at least 90% of the results end up with something like "why would you want to do that?" There is a certain type of people in forums, most likely, thinking if they do not have an answer, then, probably, it does not exist and no one would ever need it. Others truly believe, that if they know why, they can tell you how to do that in another way. The latest is sometimes true, but most of the time, the solution being requested is the only one acceptable and that's what people refuse to understand.
Sometimes, a need may rise to start a thread in a separate process and the need is not necessarily malicious. For example, one may want to replace library functions or to place some code between the executable and a library function. However, Linux does not provide a system call that would do anything similar to CreateRemoteThread Windows API despite the fact that I see people searching for such functionality. You may google for "CreateRemoteThread equivalent in Linux" yourself and see that at least 90% of the results end up with something like "why would you want to do that?" There is a certain type of people in forums, most likely, thinking if they do not have an answer, then, probably, it does not exist and no one would ever need it. Others truly believe, that if they know why, they can tell you how to do that in another way. The latest is sometimes true, but most of the time, the solution being requested is the only one acceptable and that's what people refuse to understand.
So, let's say, you need to inject a thread into a running process for whatever reason (may be you want to perform a "DLL injection" the Linux way - your business). Although, there is no specific system call to allow you that, there are plenty of other system calls and library functions that would "happily" assist you.
Unavoidable ptrace()
First time you take a look at ptrace() it is a bit frightening (just like ioctl()) - one function, lots of possible requests and go figure out when and which parameter is being ignored. In practice, it quite simple. This function is used by debuggers and in cases when one needs to monitor the execution of a process for whatever reason. We will use this function for thread injection in this article.
The first thing you would want to do is to attach to the target process:
ptrace(PTRACE_ATTACH, pid, NULL, NULL);
PTRACE_ATTACH - request to attach to a running process;
pid - the ID of the process you want to attach to.
If the return value is equal to the pid of the target process - voila, you are attached. If it is -1, however, this means that an error has occurred and you need to check errno to know what has happened. you should keep in mind, that on certain systems you may not be able to attach to a process which is not a descendant of the attaching one or has not specified it as tracer (using prctl()). For example, in Ubuntu, since Ubuntu 10.10 this is exactly the situation. If you want to change that, however, you then need to locate your ptrace.conf file and set ptrace scope to 0.
Since I am using Ubuntu and I can only attach to a child process (unless I want some additional headache) and this is what I am going to cover in this article.
Preparations
The first step, just like in case of Windows, you need to write an injector. It will load the victim process, inject the shellcode and exit. This is the simplest part and the skeleton of such loader would look like this:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <sys/user.h>
int main(int argc, char** argv)
{
pid_t pid;
int status;
if(0 == (pid = fork()))
{
// We are in the child process, so we just ptrace() and execl()
ptrace(PTRACE_TRACEME, 0, NULL, NULL);
execl(*(argv+1), NULL, NULL);
}
else
{
// We are in the parent (injector)
ptrace(PTRACE_SETOPTIONS, pid, PTRACE_O_TRACEEXEC, NULL);
// Wait for exec in the child
waitpid(pid, &status, 0);
// The rest of the code comes here
}
return 0;
}
As you can see, the loader forks and then behaves depending on the return value of the fork() function. If it returns 0, this means that we are in the child process (actually, you should check whether it returned -1, which would indicate an error), otherwise, it is a pid of the child process and we are in the parent.
Child
The child code does not have too many things to do. All that needs to be done is to tell the OS that it may be traced and replace itself with the victim executable by calling execl().
Parent
In case of parent, the situation is much different and much more complicated. You should tell the OS, that you want to get notification when the victim process issues sys_execve by calling ptrace() with PTRACE_SETOPTIONS and PTRACE_O_TRACEEXEC. Then you simply waitpid().
When waitpid() returns (and you should check the return value for -1, which means error), it is still not the best time to start the injection. Especially, given that you may have no idea of what is where in the victim process. The next step is to wait for a system call to occur by telling the OS (and it would be good to skip a couple of system calls, so that the victim may initialize properly):
ptrace(PTRACE_SYSCALL, pid, NULL, NULL);
followed by a loop:
while(1)
{
if(-1 == waitpid(pid, &status, 0))
{
//Some error occurred. Print a message and
break;
}
if(WIFEXITED(status))
{
//The victim process has terminated. Print a message and
break;
}
if(WIFSTOPPED(status))
{
// Here comes the actual injection code. Actually, all its stages.
}
if(WIFSIGNALED(status))
{
// The victim process received a signal and terminated. Print a message and
break;
}
// All done.
return 0;
}
Injection
You should introduce a variable to count stages. Let's name it step
Stage 0 (step = 0)
I have not mentioned it, but ptrace() would notify you twice during a system call. First time right before the system call (so you can inspect registers), the second notification would arrive right after system call's completion (so you can inspect the return value). Therefore, this time we do nothing, but resume the traced victim:
ptrace(PTRACE_SYSCALL, pid, NULL, NULL);
and increment the stage variable.
Stage 1 (step = 1)
Backup victim's registers, portion of victim's code that would be overwritten with your shellcode and, finally, inject your shellcode.
Use ptrace(PTRACE_GETREGS, pid, NULL, regs) where regs is a pointer to struct user_regs (declared in sys/user.h). The content of the victim's registers would be copied there.
Use ptrace(PTRACE_PEEKTEXT, pid, address_in_victim, NULL) to copy the executable code from the victim (to make a backup) and ptrace(PTRACE_POKETEXT, pid, address_in_victim, shellcode) where address_in_victim is what its name suggests (you obtain the initial value from victim's RIP on 64 or EIP on 32 bit systems). Shellcode, however, contains bytes of the code being injected packed into an unsigned long value. You, most probably, would have to make those calls for several iterations, as I do not think your shellcode would be at most 8 bytes.
The start of your shellcode will allocate memory for the thread function (unless you are going to run code that already is there).
start:
mov rax, 9 ;sys_mmap
mov rdi, 0 ;requested address
mov rsi, 0x1000 ;one page
mov rdx, 7 ;PROT_READ | PROT_WRITE | PROT_EXEC
mov r10, 0x22 ;MAP_ANON | MAP_PRIVATE
mov r8, -1 ;fd
mov r9, 0 ;offset
syscall
db 0xCC
Increment stage variable. Resume the victim process with
ptrace(PTRACE_SINGLESTEP, pid, NULL, NULL);
Stage 2 (step = 2)
Ignore all stops until
0xCC == (unsigned char)(ptrace(PTRACE_PEEKTEXT, pid,
ptrace(PTRACE_PEEKUSER, pid, offsetof(struct user, regs.rip), NULL), NULL) & 0xFF
which would mean that you have reached your break point. Check victim's rax register for return value
retval = ptrace(PTRACE_PEEKUSER, pid, offsetof(struct user, regs.rax), NULL);
and abort if it contains an error code.
You have to increment the Instruction Pointer (RIP/EIP) before letting the victim to resume:
ptrace(PTRACE_POKEUSER, pid, offsetof(struct user, regs.rip),
ptrace(PTRACE_PEEKUSER,pid, offsetof(struct user, regs.rip), NULL) + 1);
Increment stage counter and
ptrace(PTRACE_SINGLESTEP, pid, NULL, NULL);
The start of your shellcode will allocate memory for the thread function (unless you are going to run code that already is there).
start:
mov rax, 9 ;sys_mmap
mov rdi, 0 ;requested address
mov rsi, 0x1000 ;one page
mov rdx, 7 ;PROT_READ | PROT_WRITE | PROT_EXEC
mov r10, 0x22 ;MAP_ANON | MAP_PRIVATE
mov r8, -1 ;fd
mov r9, 0 ;offset
syscall
db 0xCC
Increment stage variable. Resume the victim process with
ptrace(PTRACE_SINGLESTEP, pid, NULL, NULL);
Stage 2 (step = 2)
Ignore all stops until
0xCC == (unsigned char)(ptrace(PTRACE_PEEKTEXT, pid,
ptrace(PTRACE_PEEKUSER, pid, offsetof(struct user, regs.rip), NULL), NULL) & 0xFF
which would mean that you have reached your break point. Check victim's rax register for return value
retval = ptrace(PTRACE_PEEKUSER, pid, offsetof(struct user, regs.rax), NULL);
and abort if it contains an error code.
You have to increment the Instruction Pointer (RIP/EIP) before letting the victim to resume:
ptrace(PTRACE_POKEUSER, pid, offsetof(struct user, regs.rip),
ptrace(PTRACE_PEEKUSER,pid, offsetof(struct user, regs.rip), NULL) + 1);
Increment stage counter and
ptrace(PTRACE_SINGLESTEP, pid, NULL, NULL);
Stage 3 (step = 3)
After allocating memory, your shellcode should copy the thread function there and, actually, create a thread (similar to this).
You should, again, ignore all stops as long as
0xCC != (unsigned char)(ptrace(PTRACE_PEEKTEXT, pid,
ptrace(PTRACE_PEEKUSER, pid, offsetof(struct user, regs.rip), NULL), NULL) & 0xFF
Once you get to this breakpoint, you know that the thread has been initiated and the injector has done what it was written for.
Now you have to restore the victim to its initial, pre-injection state by restoring the values of the registers:
ptrace(PTRACE_SETREGS, pid, NULL, regs);
0xCC != (unsigned char)(ptrace(PTRACE_PEEKTEXT, pid,
ptrace(PTRACE_PEEKUSER, pid, offsetof(struct user, regs.rip), NULL), NULL) & 0xFF
Once you get to this breakpoint, you know that the thread has been initiated and the injector has done what it was written for.
Now you have to restore the victim to its initial, pre-injection state by restoring the values of the registers:
ptrace(PTRACE_SETREGS, pid, NULL, regs);
and, which is even more important - you have to restore the backed up code by copying back the backed up unsigned longs.
The last thing would be detaching from the victim process:
ptrace(PTRACE_DETACH, pid, NULL, NULL);
At this point, your injector may safely exit letting the victim to continue execution.
Voila! You have just injected a thread into another process.
Output of the injector, victim program and the injected thread |
P.S. Shared Object Injection (a la DLL injection)
Although, injection of executable code is quite simple, injection of shared object is a different story. Despite the fact, that Linux kernel provides sys_uselib system call, it may be unavailable on some systems. In such case, you have several options:
- Check whether the victim uses libdl (dlopen(), dlsym() and dlclose() functions, parse the image and obtain addresses of relevant functions. However, not every program uses libdl.
- Use sys_uselib system call. However, it may be unavailable.
- Write your own shared object loader. This may be a real pain, but you would be able to reuse it whenever you need.
Hope this post was helpful. See you at the next.
Hi
ReplyDeleteNice post, there is already work about it (with a different method and similar also to yours) in stealth's papers and jugaad
There is one thing however you forgot to mention that you won't bypass the grsec restrictions of W^X
the mmap call will fail within any respectful/updated/secured ubuntu/fedora/gentoo that has grsec installed
Haven't seen those papers, so thanks for pointing that out.
DeleteAs to W^X, won't it be enough to call mmap with PROT_READ | PROT_WRITE, copy the thread function and than mprotect with PROT_READ | PROT_EXEC before creating the thread?
I think one would have to do:
Delete- mmap(..., PROT_READ | PROT_WRITE | PROT_MAYREAD | PROT_MAYEXEC, ...)
- generate code into the above area
- mprotect(..., PROT_READ | PROT_EXEC)
Source: http://pax.grsecurity.net/docs/mprotect.txt
Which, in case of this example only requires a couple of additional system calls, meaning less then 10 lines in Assembly.
Delete