Monday, December 19, 2011

Listing Loaded Shared Objects in Linux

I have recently come across several posts on the Internet where guys keep asking for Linux analogs of Windows API. One of the most frequent one is something like "EnumProcessModules for Linux". As usual, most of the replies are looking like "why do you need that?" or "Linux is not Windows". Although, the last one is totally true, it is completely useless. As to "why do you need that?" - why do you care? Poor guy's asking a question here so let's assume he knows what he's doing.

I remember looking for something like this myself while working on some virtualization project for one of my previous employers. One thing I've learnt - once the question is out of ordinary (and people do not usually ask for Windows API replacements in Linux), there is a really good chance of getting tones of useless replies and blamed for being unclear. More then that, as long as it comes to Linux, most people do not really understand the difference between doing something in the shell and doing something in your program (as, unfortunately, many call shell scripts programs as well).

Well, enough crying here. Let's get to business. As usual, a note for nerds (non nerds are welcome to comment, leave suggestions, etc.)

  • the code in this article may not contain all necessary checks for invalid values;
  • yes, there are other ways of doing this;
  • you are going to mess with libc here, so be careful;
What are Modules (in this case)
In Linux the word "module" has a different meaning from what you've been used to in Windows. While in Windows this word means components of a process (main executable and all loaded DLLs), in Linux it refers to a part of the kernel (usually a driver). If this what you mean, then you probably want to enumerate loaded kernel modules and this is beyond the scope of this article. What we are going to do here, is to write to the terminal paths of all loaded shared objects (Linux analog of Windows DLL) and we are going to do it in a less common way just to see how things are organized internally. Just like we have LDR_MODULE structure in Windows, we have link_map structure in Linux. In both cases these structures describe loaded libraries (well, in Windows there's also a LDR_MODULE for the main executable).

link_map Structure
We do not need to know too much about this structure (for those interested - see "include/link.h" in your glibc sources). We may even define our own structure for that (a minimal one):

struct lmap
{
   void*    base_address;   /* Base address of the shared object */
   char*    path;           /* Absolute file name (path) of the shared object */
   void*    not_needed1;    /* Pointer to the dynamic section of the shared object */
   struct lmap *next, *prev;/* chain of loaded objects */
}

There is some more information in the original structure, but we do not need it for now.

Getting There
So we know what the link_map structure looks like and it is good. But how can we get there? let me assume that you are aware of dynamic linking. In Linux we have dl* functions:

dlopen - loads a shared object (LoadLibrary);
dlclose - unloads a shared object (FreeLibrary);
dlsym - gets the address of a symbol from the shared object (GetProcAddress).

The dlopen function returns a pseudo handle to the loaded shared object. These functions are declared in "dlfcn.h". You also have to explicitly link the dl library by passing -ldl to gcc.

While in Windows HANDLE is equal to the base address of the module, in Linux pseudo handle is in fact a pointer to the corresponding link_map structure. This means that getting to the head of the list of loaded modules is quite easy:

struct lmap* get_list_head(void* handle)
{
   struct lmap* retval = (struct lmap*)handle;
   while(NULL != retval->prev->path)
      retval = retval->prev;
   return retval;
}

Things are a bit more complicated if you do not intend to load any shared object. You will still have to use the dl library, though.

First of all, you will have to call dlopen, despite that fact that you are not going to load anything. Call it with NULL passed as first argument and RTLD_NOW as the second. The return value in this case would be the pseudo handle for the main executable (similar to GetModuleHandle(NULL) in Windows), but it would point to a different structure (to be honest, I've been too lazy to dig for it in libc sources) then link_map. This structure contains different pointers and we are particularly interested in the fourth one. This pointer points to a structure (which I was too lazy to dig for as well) with some other pointers/values and we are particularly interested (again) in the fourth one. This pointer, in turn, finally gets us to the first link_map structure. In my case, it is a structure which refers to libdl.so.2. Let's take a look at the procedure in C

struct something
{
   void*  pointers[3];
   struct something* ptr;
}

struct lmap* pl;
void* ph = dlopen(NULL, RTLD_NOW);
struct something* p = (struct something*)ph;
p = p->ptr;
pl = (struct lmap*)p->ptr;


List Loaded Objects
Now we are ready to list all loaded objects. Assume p is a pointer to the first link_map (in our case lmap) structure:

while(NULL != p)
{
   printf("%s\n", p->path);
   p = p->next;
}

In my case the output is (about three times less than in a Windows process ;-) ):
/lib32/libdl.so.2
/lib32/libc.so.6
/lib/ld-linux.so.2

C'est tous. We are done. The mechanism described above may be used in order to either enumerate loaded shared objects or to get their handles. I personally used it for amusement.

Just remember, that in Linux, unlike Windows, handle to an object is not its base address, but the address of (pointer to) the corresponding link_map structure.


Hope this post was at least interesting (if not helpful). See you at the next!

4 comments:

  1. Great.
    Just a little typo... the last loop should be:

    while (NULL != pl)
    {
    printf("%s\n", pl->path);
    pl = pl->next;
    }

    (pl instead of p)

    Cheers.

    ReplyDelete
  2. "Assume p is a pointer..." - the line before the loop code. "p" was intentional. Anyway, "pl" could be good too. Thanks for keeping an eye on that :)

    ReplyDelete
  3. How about just reading /proc/self/map? That will include all mappings, including any mmapped data files, the heap & stack, vdsos, etc. And it will show the individual mapped segments of each shared object (e.g. .text, .data, .bss, .rodata, etc.)

    ReplyDelete
  4. Reading /proc/self/maps gives you too much unneeded information as long as you need to get the addresses and names of the shared objects. In addition, it does not provide you with the pseudo handles, just the addresses. it would also involve text processing.

    If you mean "why not use conventional way?", then there is a better way of doing that, then reading /proc/self/maps - dl_iterate_phdr.

    ReplyDelete

Note: Only a member of this blog may post a comment.