|
Reverse Engineering Microsoft OLE
For the experienced reverse-engineer, a basic analysis of what a particular piece of malware does can be a relatively quick and painless process. Simply load up the executable component into IDA or OllyDbg and let the auto-analysis match up import names with function calls. Presented with these scraps of information, a guess can be made pertaining to what function a particular subroutine performs. Labeling these subroutines creates more function cross-references that can in turn reveal more about the overall functionality of the program. However, when it comes to malware that depends heavily on OLE calls, the usual analysis speed comes to a grinding halt. This is because the COM standard on which OLE is based defines a protocol for function calls that allows interfaces and methods to be queried at run time. In its essence, it is just another import table, but our disassembler doesn't understand it and the scraps of information we need are encoded and removed from the binary itself, so we are left in the dark wondering what function CALL DWORD PTR: [ECX+54] is actually linked to. Tracing the function into the remote object is painful not terribly useful, as there is no export table that defines where we are at any given time. With a little knowledge of how OLE/COM virtual method tables work, it is possible to extract the needed information and present a clearer picture of what is happening in the disassembly. To start, let's examine a key piece of code from the Submithook trojan BHO dll:
10001BF4 LEA EAX,DWORD PTR SS:[EBP-120]
10001BFA LEA ECX,DWORD PTR SS:[EBP-18]
10001BFD PUSH EAX
10001BFE MOV DWORD PTR SS:[EBP-18],EBX
10001C01 CALL 10003352
10001C06 CMP EAX,EBX
10001C08 JGE SHORT newsubmi.10001C17
10001C0A CMP EAX,80004002
10001C0F JE SHORT newsubmi.10001C17
10001C11 PUSH EAX
10001C12 CALL 1000AA08
10001C17 MOV EAX,DWORD PTR SS:[EBP-18]
10001C1A CMP EAX,EBX
10001C1C JE newsubmi.10002212
10001C22 MOV DWORD PTR SS:[EBP-1C],EBX
10001C25 MOV ECX,DWORD PTR DS:[EAX]
10001C27 LEA EDX,DWORD PTR SS:[EBP-1C]
10001C2A PUSH EDX
10001C2B PUSH EAX
10001C2C MOV BYTE PTR SS:[EBP-4],0C
10001C30 CALL DWORD PTR DS:[ECX+30]
By itself, not very descriptive. As a hint, the CALL at 10001C01 gets an OLE interface object, and the call at 10001C30 is a call to one of that interface's methods. Just knowing the name of that method is enough to understand why this piece of code is critical. To find the method name, first we need to know what OLE interface is being used. The call to 10003352 stores the interface object in the local variable stack space [EBP-18]. Tracing into the call at this time gives us not much more information about what interface is being referenced. What we are looking for is the QueryInterface call which is used to initialize the Interface object. It happens that the QueryInterface call is always the first pointer in the struct that the Interface object is made up of. So, tracing through the first call at 1000336C, we find this:
100037AA MOV ECX,DWORD PTR DS:[EAX]
100037AC LEA EDX,DWORD PTR SS:[EBP+8]
100037AF PUSH EDX
100037B0 PUSH newsubmi.1001C1E0
100037B5 PUSH EAX
100037B6 CALL DWORD PTR DS:[ECX]
Looks like a dereference and a call to the pointer. Looking at the definition of the standard OLE QueryInterface call:
HRESULT QueryInterface(
REFIID iid,
void ** ppvObject
);
The first argument (actually second, since QueryInterface is itself a method of the IUnknown object) iid, is the GUID of the interface we are requesting. We can match this GUID to its interface name by searching the registry. In this case, the GUID is located at offset 1001C1E0. If we look at the dump, we see the 16 bytes that make up the GUID:
1001C1E0 D2 F5 50 30 B5 98 CF 11
1001C1E8 BB 82 00 AA 00 BD CE 0B
Converted to ASCII, the GUID is: 3050F5D2-98B5-11CF-BB82-00AA00BDCE0B (The first 32-bit long is little-endian, the first two 16-bit shorts are little-endian, then the rest is a big-endian bytestring) To find the corresponding interface name, one need only look at the registry key HKCR\Interface\{3050F5D2-98B5-11CF-BB82-00AA00BDCE0B} to find the key value:
IHTMLInputElement
This is the interface to be loaded. Back to the bottom of the original code snippet:
10001C17 MOV EAX,DWORD PTR SS:[EBP-18]
10001C1A CMP EAX,EBX
10001C1C JE newsubmi.10002212
10001C22 MOV DWORD PTR SS:[EBP-1C],EBX
10001C25 MOV ECX,DWORD PTR DS:[EAX]
10001C27 LEA EDX,DWORD PTR SS:[EBP-1C]
10001C2A PUSH EDX
10001C2B PUSH EAX
10001C2C MOV BYTE PTR SS:[EBP-4],0C
10001C30 CALL DWORD PTR DS:[ECX+30]
Remember, our interface object is stored in EBP-18, which is loaded into EAX. The top pointer is subsequently dereferenced into ECX - this is our virtual method table, so calling [ECX] will call method 1 of our interface, [ECX+4] will call method 2, and so on. We see above that our call is to [ECX+30], method 13. At this point, we only need to know the name of method 13 of the IHTMLInputElement interface, and we've solved the mystery. The methods are stored in the header files of Windows, so we need to seek out our favorite Windows compiler's "Include" directory. One of these files contains the virtual method table layout for our interface. We can locate it by searching the header files for the text OurFunctionNameVtbl. In this case, searching for IHTMLInputElementVtbl turns up a hit in the file mshtmlc.h. Open this file and find under IHTMLInputElementVtbl the BEGIN_INTERFACE label. Counting through the method declarations we find at number 13:
HRESULT ( STDMETHODCALLTYPE __RPC_FAR *get_name )(
IHTMLInputElement __RPC_FAR * This,
STR __RPC_FAR *p);
So, our call to [ECX+30] is actually a call to IHTMLInputElement->get_name. But it was a lot of work to get here! To make analysis quicker, I wrote a set of Perl scripts to:
Incorporating this functionality into your favorite disassembler is left as an exercise to the reader. |