TestRunner Technical Details

December 06, 2018

Technical implementation details of TestRunner - a unit testing framework for C/C++. Repository and code

Introduction

In this post I will go through some technical details and challenges found while implementing the testrunner application. The main focus in this post is the enumeration of exported symbols in dynamic loaded libraries.

If you take a look at the repository the main file for magic is module.cpp. The rest is just stitching it together (more or less).

See: https://github.com/gnilk/testrunner/blob/master/src/testrunner/module.cpp

This post assumes you have a basic understanding of what a dynamic linked library is. A basic understand of C (and pointers) is also good to have.

Way of Working

The test runner operates on dynamic libraries (.dylib) files. It will open up a dynamic library and scan it for exported functions matching a certain pattern.

Step by Step:

load library (‘dlopen’)
find the image index
get the image header (‘dyld_get_image_header’)
parse the commands to find ‘LC_SYMTAB’
parse the symtab
extract exported functions

There seems to be no standard API to enumerate exported functions in binary’s (exec, dylib, static libraries, etc..). For Mach-O 64bit dylib you have to parse the symbol table which can be found in the command table. This can be done once the library has been loaded by the application.

Application and dynamic libraries

The OS keeps track of loaded dynamic libraries for an application. On macOS (and derivates like iOS) this list is accessed like an array. Each entry contains the full path name of the loaded library. Ensuring each entry is unique.

The dynamic library itself is divided into command sections called load_commands. Part of the image header is a list of load commands.

struct mach_header_64 {
	uint32_t	magic;		/* mach magic number identifier */
	cpu_type_t	cputype;	/* cpu specifier */
	cpu_subtype_t	cpusubtype;	/* machine specifier */
	uint32_t	filetype;	/* type of file */
	uint32_t	ncmds;		/* number of load commands */
	uint32_t	sizeofcmds;	/* the size of all the load commands */
	uint32_t	flags;		/* flags */
	uint32_t	reserved;	/* reserved */
};

Directly after the header follows the load commands.

struct load_command {
	uint32_t cmd;		/* type of load command */
	uint32_t cmdsize;	/* total size of command in bytes */
};

Each command contains information for the OS to load and execute code in a library (or application). Commands are chunks of data relative the header (plus some alignment rules). This makes it easy for a loader to build intermediate lists of load commands and the location using memory mapping.

In order to enumerate exported functions in a dynamic library we need to parse the command data for the symbol table. The symbol table is identified in the command list by LC_SYMTAB.

#define	LC_SYMTAB	0x2	/* link-edit stab symbol table info */

Loading a library

Loading the library is trivial and well documented

    handle = dlopen(pathName.c_str(), RTLD_LAZY);

Once you have the handle you need to find the image (i.e. library) index for the library in the hosting application (i.e. the one loading the library). In my case the test runner application.

Find the image index

You can query the image (library) count with the dyld_image_count function and loop over the range fetching the image name for each index. Match that image name against your loaded library.

The image index is needed to get the image header.

int Module::FindImage() {
    int nImages = _dyld_image_count();
    for(int i=0;i<nImages;i++) {
        std::string imageName(_dyld_get_image_name(i));
        if (imageName.find(pathName.c_str()) != std::string::npos) {
            idxLib = i;
            break;
        }
    }
    if (idxLib == -1) {
        pLogger->Debug("Image not found!");
    }
    return idxLib;
}

Apple offical documentation

Retrieving the image header

The image header is actually a pointer (in memory) to the image itself.

    header = (struct mach_header *)_dyld_get_image_header(idxLib);

Once we have the header we can parse the command table and find the symbol table.

Parsing the command table

As the documentation states, the command table starts directly after the binary header.

The header it self contains the number of commands. Just loop over the range and skip commands of no interest. When we find the symbol table, let’s process it.

Note: We have to align the pointers correctly.

bool Module::ParseCommands() {

    uint8_t *ptrData = (uint8_t *)header;
    ptrData += sizeof(struct mach_header_64);
    ptrData = AlignPtr(ptrData);

    for(int i=0;i<header->ncmds;i++) {
        struct load_command *pcmd = (struct load_command *)ptrData;

        if (pcmd->cmd == LC_SYMTAB) {
            ProcessSymtab(ptrData);
        }
        ptrData += pcmd->cmdsize;
        ptrData = AlignPtr(ptrData);
    }

    return true;
}

Enumerate functions in binaries

The symbol table is an array. The ‘cmd’ and ‘cmdsize’ are the ‘load_command’ variable. They are just redeclared in the symtab_command structure for easier pointer casting purposes.

struct symtab_command {
	uint32_t	cmd;		/* LC_SYMTAB */
	uint32_t	cmdsize;	/* sizeof(struct symtab_command) */
	uint32_t	symoff;		/* symbol table offset */
	uint32_t	nsyms;		/* number of symbol table entries */
	uint32_t	stroff;		/* string table offset */
	uint32_t	strsize;	/* string table size in bytes */
};

Note: The ‘symoff’ is the offset to the actual symbol table in the file relative start (i.e. header).

Each symbol is a null terminated string. Just traverse this table until number of symboles have been parsed or as long as we haven’t consumed enough bytes (strsize).

Once done we have a list of all exported symbols in the dynamic library.

void Module::ParseSymTabNames(uint8_t *ptrData) {
    // Locate string table for all symbols (stroff - is relative 0 from file start)
    struct symtab_command *symtab = (struct symtab_command *)ptrData;

    uint8_t *ptrStringTable = FromOffset32(symtab->stroff);
    uint32_t bytes = 0;
    // Loop symbol table and look for names starting with "_test_"
    while(bytes < symtab->strsize) {
        std::string symbolName((char *)&ptrStringTable[bytes]);
        if (symbolName.length() > 1) {
            // Just dump all symbols to a large map            
            if (symbols.find(symbolName) == symbols.end()) {
                symbols.insert(std::pair<std::string, int>(symbolName, bytes));
            }
        }       
        bytes += symbolName.length();
        bytes++;
    }
}

Extracting the right symbol

Go through the list of all parsed symbols and verify if they match our pattern. Remeber that C exported symbols are prefixed with an underscore ’_‘.

In the case of the test runner this is searching for functions starting with “_test_” (actually I just search for any function with pattern “_test_”).

void Module::ExtractTestFunctionFromSymbols() {
    // Extract valid test functions
    for(auto x:symbols) {
        if (IsValidTestFunc(x.first)) {
            exports.push_back(x.first);
        }
    }
}

bool Module::IsValidTestFunc(std::string funcName) {
    // The function table is what really matters
    if (funcName.find("_test_",0) == 0) {
        return true;
    }
    return false;
}

Invoking test functions

Getting a handle to the actual function and invoking it is done with standard functions.

Once we have the export name we can simply call ‘dlsym’ on the library handle (from dlopen) and retrieve the function pointer.

    void *ptrInvoke = dlsym(handle, exportName.c_str());

In the test runner this done in the class testfunc.cpp.

Excerpt of code in function ‘TestFunc::Execute’:

    PTESTFUNC pFunc = (PTESTFUNC)module->FindExportedSymbol(symbolName);

    if (pFunc != NULL) {
        // 1) Setup test response proxy
        TestResponseProxy *trp = TestResponseProxy::GetInstance();
        trp->Begin(symbolName, moduleName);

        // 2) call function to be tested
        int testReturnCode = pFunc((void *)trp->Proxy());
        // 3) Stop test
        trp->End();

        // 4) Gather data from test
        testResult->SetResult(trp->Result());
        testResult->SetNumberOfErrors(trp->Errors());
        testResult->SetTimeElapsedSec(trp->ElapsedTimeInSec());
        
        // Overwrite the result based on return code
        HandleTestReturnCode(testReturnCode, testResult);
    }
    SetExecuted();