I find myself often using hex-strings of assembly instructions in C++ programs, for example, "xebx1fx5ex89x76x08x31xc0x88x46x07x89x46x0cxb0x0b" (snippet from http://www.phrack.org/phrack/49/P49-14, as a canonical example of shellcode). Such hex-strings can often be found in penetration-testing tools, as well as in code-injection tools.

 

I was working on creating a code-injection tool in C++ last night to help with my malware analysis work. Since the code that I needed to inject was a buffer of x86 assembly instructions, I used RTA to type up the assembly code, saved the file, opened it in my hex editor, copied the instructions as a hex-string, and pasted it into my injector project. I could have used HIEW or OllyDbg or something else instead of RTA; I could have even written the assembly code in an __asm{...} block in C++ and compiled it to get the instructions. However, all of these solutions required copying a hex-string back into my injector program. This gets even more annoying if I want to <gasp> update my assembly code!

I thought, "wouldnt it be nice if I could write the assembly code directly into my C++ program and be able to make use of that buffer without using any hex-strings?"

Well, I decided to implement a solution:

typedef struct _ASSEMBLY_BUFFER
{
    void* pBuffer;
    unsigned long ulSize;
} ASSEMBLY_BUFFER, *PASSEMBLY_BUFFER;

//
// Gets a pointer to the x86 assembly code buffer starting at the buffer_begin
// label. Also gets the size of the buffer.
//
void __fastcall GetAssemblyBuffer(PASSEMBLY_BUFFER)
{
    __asm
    {
        mov eax, offset buffer_begin ; Get address of first instruction in assembly
        mov [ecx], eax               ;  buffer and save it to .lpBuffer
        mov edx, offset buffer_end
        sub edx, eax                 ; Determine difference between beginning and end
        mov [ecx+4], edx             ;  of assembly buffer, and save it to .dwSize
    }
    return;

    __asm
    {
buffer_begin:

        <assembly code>       ; Our assembly code buffer

buffer_end:
    }
}

Figure 1. GetAssemblyBuffer(...) function and typedef.


We simply put our assembly code between the
buffer_begin and buffer_end labels, and can then use GetAssemblyBuffer(...) to access it.

Take the following program for example:

#include <stdio.h>

typedef struct _ASSEMBLY_BUFFER
{
    void* pBuffer;
    unsigned long ulSize;
} ASSEMBLY_BUFFER, *PASSEMBLY_BUFFER;

//
// Gets a pointer to the x86 assembly code buffer starting at the buffer_begin
// label. Also gets the size of the buffer.
//
void __fastcall GetAssemblyBuffer(PASSEMBLY_BUFFER)
{
    __asm
    {
        mov eax, offset buffer_begin ; Get address of first instruction in assembly
        mov [ecx], eax               ;  buffer and save it to .lpBuffer
        mov edx, offset buffer_end
        sub edx, eax                 ; Determine difference between beginning and end
        mov [ecx+4], edx             ;  of assembly buffer, and save it to .dwSize
    }
    return;

    __asm
    {
buffer_begin:

        mov eax, 15DBh        ; Our assembly code buffer
        rol eax, 13h
        xor eax, 0DEADBEEFh
        shr eax, 10h
        mov ebx, eax
        shl eax, 2
        add eax, ebx
        add eax, ebx
        add eax, ebx
        add eax, 4


buffer_end:
    }
}

int main(int argc, char** argv)
{
    ASSEMBLY_BUFFER asmbuf = {0};
    GetAssemblyBuffer(&asmbuf);

    printf("Assembly code buffer: ");

    for (unsigned long i = 0; i < asmbuf.ulSize; i++)
    {
        printf("\x%02x", ((
unsigned char*)asmbuf.pBuffer)[i]);
    }

    return 0;
}

Figure 2. Sample program that uses GetAssemblyBuffer(...).


The program above would output:

Assembly code buffer:
xb8xdbx15x00x00xc1xc0x13x35xefxbexadxdexc1xe8x10x8bxd8xc1xe0
x02x03xc3x03xc3x03xc3x83xc0x04

Figure 3. Output of sample program above.


With this functionality, we can now do things like
WriteProcessMemory(hProcess, lpBaseAddress, asmbuf.pBuffer, asmbuf.ulSize, lpNumberOfBytesWritten) or send(s, asmbuf.pBuffer, asmbuf.ulSize, flags) without having to paste any hex-strings into our C++ code.