System Programming: virtual machine

Showing posts with label virtual machine. Show all posts

Friday, August 31, 2012

Emulation of Hardware. CPU & Memory

There are tens of hardware platforms (although, some people would say that there is only one - computer ;-) ). Each one has its own advantages over others and disadvantages as well. For example Intel is the most used platform for desktops, ARM and MIPS are widely used in embedded systems and so on. Sometimes, a need may arise to test/debug executable code written for platform other then the one you have access to. For example, what if you have to run ARM code while using Intel based desktop? In most cases, this is not a problem at all due to a large amount of available platform emulators (e.g. QEMU and many others). However, even though QEMU is quite a powerful tool, there are certain cases when it is not helpful (at least not without certain modifications).

Note for nerds:

Yes, there are such cases - if you have not seen one, does not mean they do not exist.

The code in this article is for demonstration purposes only - checks for errors may be omitted. It may be unoptimized.

Yes, there may be better ways.

Either forced by current needs or just for fun, you may want to write your own emulator for any existing (or not existing) platform. You may check this article to see how a simplistic CPU may be designed and implemented. However, CPU is only a tiny (although, important) part of your emulator. There are many other things that you would have to take care of, such as memory, IO devices, etc. Of course, the complexity of the implementation depends on how isolated you want your emulator to be.

As you may understand from the title of this article, we are going to concentrate on the CPU to Memory (RAM) interface. It may be a good idea to define how much memory should your emulator support (define the width of the address line) in advance. For example, if you are going to support at most 64 kB, then 16 bit addressing mode would be enough. In such case, you may simply allocate a continuous memory area and access it directly. However, what if you plan to support 1 or 2 or even more gigabytes? Although, it would not necessarily be used at once, but your architecture may imply this. You definitely would not want to make such a huge allocation. Especially not if the software you are planning to run uses a tiny bit of memory in lower address space, a tiny bit in the upper and itself is loaded somewhere in the middle. If this is the situation, then you should implement a kind of a paging mechanism, which would only allocate pages for addresses which are actually being used.

Paging

Let's make some definitions to deal with pages:

#define PAGE_SIZE 0x1000 // You may choose to use other size

#define PAFE_MASK 0x0FFF // This depends on the value of PAGE_SIZE

typedef struct _page_t

{

struct _page_t* previous, next;

unsigned long base; // Address in the emulated memory represented by this page

unsigned int flags; //Whatever flags you want your pages to have

unsigned char* mem; // Pointer to the actual allocated memory

}page_t;

The mechanism is quite similar to the actual paging mechanism used today, except that you do not have to use page tables as most of the time a simple linked list of pages is enough and that you are not mapping virtual memory to physical, but mapping emulated memory to the virtual memory which is accessible for the emulator.

previous and next - pointers to other page_t structures in the linked list of pages;

base - lower address of the emulated memory represented by this page;

flags - any attributes you would like your pages to have (e.g. is it writable or executable, etc.);

mem - pointer to the memory area actually allocated by the emulator.

Using such mechanism will reduce the overall memory usage as you would have to allocate only those memory areas used by the software you are running on your emulator.

Page Management

It is, of course, up to you how to manage this kind of paging, but, as it seems to me, it may be a good idea to implement a set of functions to manage the sorted (by base) linked list of pages:

page_t* memory_page_alloc(void);

This function would simply return a pointer to an allocated page_t structure. Don't forget to allocate real memory area of PAGE_SIZE and store a pointed to it in page_t->mem.

void memory_page_release(page_t** pg);

This function releases all the resources allocated for a page. This includes the memory which actually represents the page and is pointed to by page_t->mem and the page_t structure itself.

int memory_page_add(page_t** page_list, unsigned long base);

This function is responsible for allocation of a new page, which would represent memory starting at base and its insertion into the sorted linked list of pages.

*page_list - pointer to the first page in the linked list of pages;

base - beginning address of the emulated memory of size PAGE_SIZE.

Its return value should tell you whether a page has been added or an error occurred during memory allocations.

Memory Access Emulation

Due to the fact that we are not talking about one consistent array, but rather several separated memory areas (from the emulator's point of view) it makes sense to write a couple of functions that would perform read/write operations from/to the emulated memory.

int memory_read_byte(page_t* pg_list, unsigned long address, unsigned char* byte);

This function is responsible for reading a single byte from the emulated memory pointed by address. The read byte is returned into location pointed by byte. It walks the linked list of pages looking for a page where page_t->base <= address && (page_t->base + PAGE_SIZE) > address. If there is no such page, then it either allocates and adds it to the list of pages, then performs the read operation or simply returns error (this may be helpful in order to emulate memory access violations). It is up to you to define the behavior of this function in such situation. In fact, you may define an internal flag to enable/disable automatic page allocations.

int memory_write_byte(page_t* pg_list, unsigned long address, unsigned char byte);

This function is almost identical to the one above, except that it writes a single byte to the emulated memory. Its behavior should be the same as memory_read_byte.

It is definitely not that good to only be able to transfer one byte at a time, so you are more then welcome to implement functions for larger transfers. However, you will need to be careful in those cases when such transfer involves two pages and check that both pages are allocated (meaning accessible).

Of course, there are many more things to emulate like IO devices, possibly network adapters, but memory is the most important. But this goes far beyond the scope of this article.

Hope this article was informative. See you at the next.

Wednesday, May 23, 2012

Passing Events to a Virtual Machine

The source code for this article may be found here.

Virtual machines and Software Frameworks are an initial part of our digital life. There are complex VM and simple Software Frameworks. These two articles (Simple Virtual Machine and Simple Runtime Framework by Example) show how easy it may be to implement one yourself. I did my best to describe the way VM code may interact with native code and the Operating System, however, the backwards interaction is still left unexplained. This article is going to fix this omission.

As usual - note for nerds:

The source code given in this article is for example purposes only. I know that this framework is far from being perfect, therefore, this article is not a howto or tutorial - just an explanation of principle. Error checks are omitted on purpose. You want to implement a real framework - do it yourself, including error checks.

By saying VM's code I do not refer to the implementation of the virtual machine, but to the pseudo code that runs inside it.

Architecture Overview

Needless to mention, that the ability to pass events/signals to a code executed by the virtual machine implies a more complex VM architecture. While all previous examples were based on a single function responsible for the execution, adding events means not only adding another function, but we will have to introduce threads to our implementation.

At least two threads are needed:

Fig.1

VM Architecture with Event Listener

Actual VM - this thread is responsible for the execution of the VM's executable code and events queue dispatch (processor);
Event Listener - this thread is responsible for collection of relevant events from the Operating Systems and adding them to the VM's event queue (listener).

You may see that the Core() function, in the attached source code, creates additional thread.

Event ListenerThis thread collects events from the Operating System (mouse move, key up/down, etc) and adds a new entry to the list of EVENT structures.

typedef struct _EVENT

{

struct _EVENT* next_event; // Pointer to the next event in the queue

int code; // Code of the event

unsigned int data; // Either unsigned int data or the address of the buffer

// containing information to be passed to the handler

}EVENT;

The code for the listener is quite simple:

while(WAIT_TIMEOUT == WaitForSingleObject(processor_thread, 1))

{

// Check for events from the OS

if(event_present)

{

EnterCriticalSection(&cs);

event = (EVENT*)malloc(sizeof(EVENT));

event->code = whatever_code_is_needed;

event->data = whatever_data_is_relevant;

add_event(event_list, event);

event->next_event = NULL;

LeaveCriticalSection(&cs);

}

The code is self explanatory enough. First of all it checks for available events (this part is omitted and replaced by a comment). If there is a new event to pass to the VM, it adds it to the queue. While in this example, event collection is implemented as a loop, in real life, you may do it in a form of callbacks and use the loop above just to wait for the processor thread to exit.

Processor

Obviously, the "processor" thread is going to be a bit more complicated, then in the previous article (Simple Runtime Framework by Example), as in addition to running the run_opcode(CPU**) function, it has to check for pending events and pass the control flow to the associated handler in the VM code.

typedef struct _EVENT_HANDLER

{

struct _EVENT_HANDLER* next_handler; // Pointer to the next handler

int event_code; // Code of the event

unsigned int handler_base; // Address of the handler in the VM's code

}EVENT_HANDLER;

DWORD WINAPI RunningThread(void* param)

{

CPU* cpu = (CPU*)param;

EVENT* event;

EVENT_HANDLER* handler;

do{

EnterCriticalSection(&cs);

if(NULL != events)

{

event = events;

events = events->next_event;

// Save current context by pushing VM registers to VM's stack

cpu->regs[REG_A] = (unsigned int)event->code;

cpu->regs[REG_B] = event->data;

handler = handlers;

while(NULL != handler && event->code != handler->event_code)

handler = handler->next_handler;

cpu->regs[REG_IP] = handler->handler_base;

free(event);

}

LeaveCriticalSection(&cs);

}while(0 != run_opcode(&cpu));

return cpu->regs[REG_A];

}

We are almost done. Our framework already knows how to pass events to a correct handler in the VM's code. Two more things are yet uncovered - registering a handler and returning from a handler.

Returning from Handler

Due to the fact that Event Handler is not a regular routine, we cannot return from it using the regular RET instruction, instead, let's introduce another instruction - IRET. As event actually interrupts the execution flow of the program, IRET - interrupt return is exactly what we need. The source code that handles this instruction is so simple, that there is no need to give it here in the text of the article. All it does is simply restoring the context of the VM's code by popping the registers previously pushed on stack.

Registering an Event Handler

The last thing left is to "teach" the program written in pseudo assembly to register a handler for a given event type. In order to do this, we need to add one simple system call - SYS_ADD_LISTENER. This system call accepts two parameters:

Code of the event to handle;
Address of the handler function.

loadi A, 0 ;Code of the event

loadi B, handler ;Address of the handler subroutine

_int sys_add_listener ;Register the handler

Example Code

The example code attached to this article is the implementation of all of the above. It does the following:

Registers event handler;
Enters an infinite loop printing out '.' every several milliseconds;
The first thread waits a bit and generates an event;
Event handler terminates the infinite loop and returns;
The program prints out a message and exits.

I hope this post was helpful or, at least, interesting.

See you at the next.

Saturday, May 19, 2012

Simple Runtime Framework by Example

Source code for this article may be found here.

These days we are simply surrounded by different software frameworks. Just to name a few: Java, .Net and, actually, many more. Have you ever wondered how those work or have you ever wanted or needed to implement one? In this article, I will cover a simple or even trivial runtime framework.

As usual - note for nerds:

Now, to let's get to business.

Software Framework

Wikipedia gives the following identification for the term "Software Framework" - "A software framework is a universal, reusable software platform used to develop applications, products and solutions. Software Frameworks include support programs, compilers, code libraries, an application programming interface (API) and tool sets that bring together all the different components to enable development of a project or solution". As you can see, software framework is quite a complex thing. However, let's simplify it and see how it basically work.

Figure 1.
Software Framework

The diagram on the left may give you a good understanding of what Software Framework is and what role it performs. Simply saying, it is a shim between the user application and the Operating System. There are at least two types of Software Frameworks:

Application Programming Interface (API) - if we take a look at Windows API, we may see that it is a framework as well. However, it may be bypassed or, at least, a programmer may choose to decrease the interaction with it by, for example, using functions from ntdll.dll instead of those provided by kernel32.dll or even "talk" to Windows kernel directly (highly not recommended, but may be unavoidable some times) through interrupts.
.Net like framework - total isolation of user code from the operating system. Such frameworks are mostly virtual machines totally isolating user application from the operating system and hardware. However, such framework has to provide the application with all the services available in the Operating System. This is type of framework we are going to build in this article.

Virtual Machine

The basics of building a simple virtual machine is covered in this article, so I will only give a brief explanation here. Our VM in this example will consist of the following components:

Virtual CPU
A structure that represents a CPU - basically, has 6 registers and a pointer to the stack:

typedef struct
{
unsigned int regs[6];
unsigned int* stack;
}CPU;

The 6 registers are general purpose A, B, C and D, where A is also used to store system call return value and C is used as a counter for LOOP instruction, STACK POINTER (SP) and INSTRUCTION POINTER (IP).
Instruction Interpreter
A function or a set of functions which responsible for interpretation of the pseudo assembly (or call it intermediate assembly language) designed for this virtual machine (in this case 14 instructions).
System Call Handler
This component provides the means for the user application to interact with the Operating System (in this case 2 system calls: sys_write and sys_exit).

Core Function

The name of the function speaks for itself. This is the first function of the framework implementation which gains control. In this particular case, it does not have too many things to do - initialization of the virtual CPU and execution of the command interpreter, until the user application exits (signals the framework to terminate the execution).

Implementation

It is a common practice to implement a framework as a DLL (dynamic link library), for example, mscoree.dll - the core of the .Net framework. I do not see any reason to reinvent the wheel, therefore, this framework will be implemented as a DLL as well.

All is fine, you may say, but how should we pass the compiled pseudo assembly code to the framework? Well, I bet, most of you know how to do that. In case you don't - no worries, just keep reading.

In case of .Net framework (at least as far as I know), the loader identifies a file as a .Net executable, reads in the meta header, and initializes the mscoree.dll appropriately. We will not go through all those complications and will use a regular PE file:

Figure 2.

Customized PE file.

PE Header - regular PE Header, no modification needed;
Code Section - simply invokes the core function of the framework:

push pseudo_code_base_address
call [core]
Import Section - regular import section that only imports one function from the framework.dll - framework.core(unsigned int);
Data Section - this section contains the actual compiled pseudo assembly code and whatever headers you may come up with, that may instruct the core() function to correctly initialize the application.

Example Executable Source Code

The following is the source code of the example executable. It may be compiled with FASM (Flat Assembler).

include 'win32a.asm' ;we need the 'import' macro

include 'asm.asm' ;pseudo assembly commands and constants

format PE console

entry start

section '.text' readable executable

start:

push _base

call [core_func]

section '.idata' data import writeable

library framework, 'framework.dll'

import framework,\

core_func, 'Core'

section '.data' readable writeable

_base:

loadi A, _base
loadi B, 0x31
_add A, B
loadr B, A
loadi A, _data.string
loadi C, _data.string_len
_call _func
loadi A, 1
loadi B, _data.string
loadi C, _data.str_len
_int sys_write
loadi A, 1
loadi B, _data.msg
loadi C, _data.msg_len
_int sys_write
_int sys_exit

_func:
; A = string address
; B = key
; C = counter
.decode:
loadr D, A
xorr D, B
storr A, D
loadi D, 4
_add A, D
_loop .decode
_ret

_data:
.string db 'Hello, developer!', 10, 13
.str_len = $-.string
db 0
.string_len = ($-.string)/4
.msg db 'The program will now exit.', 10, 13
.msg_len = $-.msg

;Encrypt one string
load k dword from _base + 0x31
repeat 5
load a dword from _data.string + (% - 1) * 4
a = a xor k
store dword a at _data.string + (% - 1) * 4
end repeat

The code above produces a tiny executable which invokes framework's core() function. Pseudo assembly code simply prints two messages (the first one is decoded prior to being printed). Full sources are attached to this article (see the very first line).

The good thing is that you do not have to start the interpreter and load this executable (or specify it as a command line parameter) - you may simply run this executable, Windows loader will bind it with the framework.dll automatically. The bad thing is that you would, most probably, have to write your own compiler, because writing assembly is fun, dealing with pseudo assembly is fun as well, BUT, only when done for fun. It is not as pleasant when dealing with production code.

Possible uses

Unless you are trying to create a framework that would overcome existing software frameworks, you may use such approach to increase the protection of your applications by, for example, virtualizing cryptography algorithms or any other part of your program which is not essential by means of execution speed, but represents a sensitive intellectual property.

Hope you find this article helpful.

See you at the next!

Thursday, December 22, 2011

Simple Virtual Machine

Sample code for this article may be found here.

In computing, Virtual Machine (VM) is a software implementation of either existing or a fictional hardware platform. VM's are generally divided into two classes - system VM (VM which is capable of running an operating system) and process VM (the one that only can run one executable, roughly saying). Anyway, if you are just interested in the definition of the term, you better go here.

There are tones of articles dedicated to this matter on the Internet, hundreds of tutorials and explanations. I see no reason to just add another "trivial" article or tutorial to the row. Instead, I think it may be more interesting to see it in action, to have an example of real application. One may say that we are surrounded by those examples - Java, .NET, etc. It is correct, however, I would like to touch a slightly different application of this technology - protect your software/data from being hacked.

Data Protection

Millions of dollars are being spent by software (or content) vendors in an attempt to protect their products from being stolen or used in any other illegal way. There are numerous protection tools and utilities, starting with simple packers/scramblers and ending with complex packages that implement multilevel encryption and virtual machines as well. However, you may disagree, but you wont convince me, an out-of-the-box solution is good until it gains popularity. There is enough evidence for this statement. In my opinion, no one can protect your software better than you. It only depends on how much protected you want it to be.

Although, there are numerous protection methods and techniques, we are going to concentrate on a virtual machine for data coding/decoding. Nothing special, just a trivial XOR method, but, in my opinion, enough to demonstrate the fundamentals.

Design Your VM

While in real life, hardware design precedes its software counterpart, we may let ourselves to do it in reverse order (it is our own VM, after all). Therefore, we will begin with the pseudo executable file format which will be supported by our VM.

Pseudo Executable File Format

Well, it is a good idea to put a header in the beginning of the file. In order to do so, we have to think what our file is going to contain. The file may be a raw code (remember DOS com files?), but this would not be interesting enough. So, let our file be divided into three sections:

code section - this section would contain code written in our pseudo assembly language (we'll cover it a bit later);
data section - this section would contain all the data needed by our pseudo executable (PE :-) );
export section - this section would contain references to all the elements that we want to make visible to the core program.

Let us define the header as a C structure:

typedef struct _VM_HEADER

{

unsigned int version; /* Version of our VM. Will be 0x101 for now */

unsigned int codeOffset; /* File offset of the code section */

unsigned int codeSize; /* Size of the code section in bytes */

unsigned int dataOffset; /* File offset of the data section */

unsigned int dataSize; /* Size of the data section in bytes */

unsigned int exportOffset; /* File offset of the export section */

unsigned int exportSize; /* Size of the export section in bytes */

unsigned int requestedStack; /* Required size of stack in 4 bytes blocks */

unsigned int fileSize; /* Size of the whole file in bytes */

}VM_HEADER;

Well, one more thing. Actually the most important one. We need a compiler for our pseudo assembly that would be able to output files of this format. Fortunately, we do not have to write one (although, this may be an interesting task). Tomasz Grysztar has done a wonderful work with his Flat Assembler. Despite the fact, that this compiler is intended to compile Intel assembly code, thanks to the wonderful macro instruction support, we can adopt it to our needs. The skeleton source for our file would look like this:

include 'defs.asm' ;Definitions of our pseudo assembly instructions

org 0

; Header =======================

h_version dd 0x101

h_code dd _code

h_code_size dd _code_size

h_data dd _data

h_data_size dd _data_size

h_exp dd _export

h_exp_size dd _export_size

h_stack dd 0x40

h_size dd size

; Code =========================

_code:

_function:

;some pseudo code here

_code_size = $ - _code

; Data =========================

_data:

;some data here

_data_size = $ - _data

; Export =======================

_export:

;export table structures here

_export_size = $ - _export

size = $ - h_version

as simple as that.

Export section deserves special attention. I tried to make it as easy to use as possible. It is divided into two parts:

Array of file offsets of export entries terminated by 0;
Export entries:

File offset of the exported function/variable (4 bytes);
Public name of the exported object (NULL terminated ASCII string);

In the above example, the export section would look like this:

; Array of file offsets

dd _f1 ; Offset of '_f1' export entry

dd 0 ; Terminating 0

; List of export entries

_f1 dd _function ; File offset

db 'exported_function_name',0 ; Public name

Save the file as 'something.asm' or whatever name you prefer. Compile it with Fasm.

Pseudo Assembly Language

Now, when we are done with the file format, we have to define our pseudo assembly language. This includes both definition of commands and instruction encoding. As this VM is designed to only code/decode short text message, there is no need to develop full scale set of commands. All we need is MOV, XOR, ADD, LOOP and RET.

Before you start writing macros that would represent these commands, we have to think about instruction encoding. This is not going to be difficult - we are not Intel. For simplicity, all our instructions will be two bytes long followed by one or more immediate arguments if there are any. This allows us to encode all the needed information, such as opcode, type of arguments, size of arguments and operation direction:

typedef struct _INSTRUCTION

{

unsigned short opCode:5; /* Opcode value */

unsigned short opType1:2; /* Type of the first operand if present */

unsigned short opType2:2; /* Type of the second operand if present */

unsigned short opSize:2; /* Size of the operand(s) */

unsigned short reg1:2; /* Index of the register used as first operand */

unsigned short reg2:2; /* Index of the register used as second operand */

unsigned short direction:1; /* Direction of the operation */

}INSTRUCTION;

Define the following constants:

/* Operand types */

#define OP_REG 0 /* Register operand */

#define OP_IMM 1 /* Immediate operand */

#define OP_MEM 2 /* Memory reference */

#define OP_NONE 3 /* No operand (optional) */

/* Operand sizes */

#define _BYTE 0

#define _WORD 1

#define _DWORD 2

/* Operation direction */

#define DIR_LEFT 0

#define DIR_RIGHT 1

/* Instructions (OpCodes) */

#define MOV 1

#define MOVI 7

#define ADD 2

#define SUB 3

#define XOR 4

#define LOOP 5

#define RET 6

It seems to me that there is no reason to put all the macros defining our pseudo assembly opcodes here, as it would be a waste of space. I will just put one here as an example. This will be the definition of MOV instruction:

Constants to be used with our pseudo assembly language

Macro defining the MOV instruction

As you can see in the code above, I've been lazy again and decided, that it would be easier to implicitly specify the size of the arguments, rather then writing some extra code to identify their size automatically. In addition, the name of the instruction tells what that specific instruction is intended to do. For example, mov_rm - moves value from memory to register and letters 'r' and 'm' tell what types of arguments are in use (register, memory). In this case, moving a WORD from memory to a register would look like this:

mov_rm REG_A, address, _WORD

and the whole code section (currently contains only one function) is represented by the image below:

loads address of the message as immediate value into B register; loads length of the message from address described by message_len into C register; iterates message_len times and applies XOR to every byte of the message. "mov_rmi" performs the same operation as "mov_rm" but the address is in the register specified as second parameter.

This is what the output looks like in IDA Pro:

Header

Code

Data and Export sections

Virtual Machine

Alright, now, when we have some sort of a "compiler", we may start working on the VM itself. First of all, let us define a structure, that would represent our virtual CPU:

typedef struct _VCPU

{

unsigned int registers[4]; /* Four registers */

unsigned int *stackBase; /* Pointer to the allocated stack */

unsigned int *stackPtr; /* Pointer to the current position in stack */

unsigned int ip; /* Instruction pointer */

unsigned char *base; /* Pointer to the buffer where our pseudo

executable is loaded to */

}VCPU

registers - general purpose registers. There is no need for any additional register in this VM's CPU;

stackBase - pointer to the beginning of the allocated region which we use as stack for our VM;

stackPtr - this is our stack pointer;

ip - instruction pointer. Points to the next instruction to be executed. It cannot point outside the buffer containing our pseudo executable;

base - pointer to the buffer which contains our executable. You may say that this is the memory of our VM.

In addition, you should implement at least some functions for the following:

allocate/free virtual CPU
load pseudo executable into VM's memory and setup stack
a function to retrieve either a file offset or normal pointer to an object exported by the pseudo executable
a function to set instruction pointer (although, this may be done by directly accessing the ip field of the virtual CPU
a function that would run our pseudo code.

In my case, the final source looks like this:

I decided not to cite the VM's code here as you should be able to write it yourself if the subject is interesting enough for you. Although, the code in this article does not contain any checks for correct return values, you should take care of them.

Summary

Although, this article describes a trivial virtual machine which is only able to encode/decode a fixed length buffer, the concept itself may serve you well in software/data protection as hacking into VM is several times harder then cracking native code.

One more thing to add. Our design allows us to call procedures provided by the pseudo executable, but there are several ways to allow the pseudo executable to "talk to us". The simplest (as it seems to me) is to implement interrupts.

I hope, I've covered it. Would appreciate comments and/or suggestions.

P.S. The encoded result would be "V{rrq2>Iqlrz?".

See you at the next post!

Search This Blog