Flag: Tornado! Hurricane!

Reverse Engineering Microsoft OLE

Monday, September 12 2005 00:19.51 CDT
Author: joestewart # Views: 62686 Printer Friendly ...

For the experienced reverse-engineer, a basic analysis of what a particular piece of malware does can be a relatively quick and painless process. Simply load up the executable component into IDA or OllyDbg and let the auto-analysis match up import names with function calls. Presented with these scraps of information, a guess can be made pertaining to what function a particular subroutine performs. Labeling these subroutines creates more function cross-references that can in turn reveal more about the overall functionality of the program.

However, when it comes to malware that depends heavily on OLE calls, the usual analysis speed comes to a grinding halt. This is because the COM standard on which OLE is based defines a protocol for function calls that allows interfaces and methods to be queried at run time. In its essence, it is just another import table, but our disassembler doesn't understand it and the scraps of information we need are encoded and removed from the binary itself, so we are left in the dark wondering what function CALL DWORD PTR: [ECX+54] is actually linked to. Tracing the function into the remote object is painful not terribly useful, as there is no export table that defines where we are at any given time.

With a little knowledge of how OLE/COM virtual method tables work, it is possible to extract the needed information and present a clearer picture of what is happening in the disassembly. To start, let's examine a key piece of code from the Submithook trojan BHO dll:

    10001BF4  LEA EAX,DWORD PTR SS:[EBP-120]
    10001BFD  PUSH EAX
    10001C01  CALL 10003352
    10001C06  CMP EAX,EBX
    10001C08  JGE SHORT newsubmi.10001C17
    10001C0A  CMP EAX,80004002
    10001C0F  JE SHORT newsubmi.10001C17
    10001C11  PUSH EAX
    10001C12  CALL 1000AA08
    10001C17  MOV EAX,DWORD PTR SS:[EBP-18]
    10001C1A  CMP EAX,EBX
    10001C1C  JE newsubmi.10002212
    10001C22  MOV DWORD PTR SS:[EBP-1C],EBX
    10001C25  MOV ECX,DWORD PTR DS:[EAX]
    10001C27  LEA EDX,DWORD PTR SS:[EBP-1C]
    10001C2A  PUSH EDX
    10001C2B  PUSH EAX
    10001C2C  MOV BYTE PTR SS:[EBP-4],0C
    10001C30  CALL DWORD PTR DS:[ECX+30]

By itself, not very descriptive. As a hint, the CALL at 10001C01 gets an OLE interface object, and the call at 10001C30 is a call to one of that interface's methods. Just knowing the name of that method is enough to understand why this piece of code is critical. To find the method name, first we need to know what OLE interface is being used.

The call to 10003352 stores the interface object in the local variable stack space [EBP-18]. Tracing into the call at this time gives us not much more information about what interface is being referenced. What we are looking for is the QueryInterface call which is used to initialize the Interface object. It happens that the QueryInterface call is always the first pointer in the struct that the Interface object is made up of. So, tracing through the first call at 1000336C, we find this:

    100037AC  LEA EDX,DWORD PTR SS:[EBP+8]
    100037AF  PUSH EDX
    100037B0  PUSH newsubmi.1001C1E0
    100037B5  PUSH EAX
    100037B6  CALL DWORD PTR DS:[ECX]

Looks like a dereference and a call to the pointer. Looking at the definition of the standard OLE QueryInterface call:

    HRESULT QueryInterface(
      REFIID iid,
      void ** ppvObject

The first argument (actually second, since QueryInterface is itself a method of the IUnknown object) iid, is the GUID of the interface we are requesting. We can match this GUID to its interface name by searching the registry. In this case, the GUID is located at offset 1001C1E0. If we look at the dump, we see the 16 bytes that make up the GUID:

    1001C1E0   D2 F5 50 30 B5 98 CF 11
    1001C1E8   BB 82 00 AA 00 BD CE 0B

Converted to ASCII, the GUID is: 3050F5D2-98B5-11CF-BB82-00AA00BDCE0B (The first 32-bit long is little-endian, the first two 16-bit shorts are little-endian, then the rest is a big-endian bytestring)

To find the corresponding interface name, one need only look at the registry key HKCR\Interface\{3050F5D2-98B5-11CF-BB82-00AA00BDCE0B} to find the key value:


This is the interface to be loaded.

Back to the bottom of the original code snippet:

    10001C17  MOV EAX,DWORD PTR SS:[EBP-18]
    10001C1A  CMP EAX,EBX
    10001C1C  JE newsubmi.10002212
    10001C22  MOV DWORD PTR SS:[EBP-1C],EBX
    10001C25  MOV ECX,DWORD PTR DS:[EAX]
    10001C27  LEA EDX,DWORD PTR SS:[EBP-1C]
    10001C2A  PUSH EDX
    10001C2B  PUSH EAX
    10001C2C  MOV BYTE PTR SS:[EBP-4],0C
    10001C30  CALL DWORD PTR DS:[ECX+30]

Remember, our interface object is stored in EBP-18, which is loaded into EAX. The top pointer is subsequently dereferenced into ECX - this is our virtual method table, so calling [ECX] will call method 1 of our interface, [ECX+4] will call method 2, and so on. We see above that our call is to [ECX+30], method 13.

At this point, we only need to know the name of method 13 of the IHTMLInputElement interface, and we've solved the mystery. The methods are stored in the header files of Windows, so we need to seek out our favorite Windows compiler's "Include" directory. One of these files contains the virtual method table layout for our interface. We can locate it by searching the header files for the text OurFunctionNameVtbl. In this case, searching for IHTMLInputElementVtbl turns up a hit in the file mshtmlc.h. Open this file and find under IHTMLInputElementVtbl the BEGIN_INTERFACE label. Counting through the method declarations we find at number 13:

        IHTMLInputElement __RPC_FAR * This,
        STR __RPC_FAR *p);

So, our call to [ECX+30] is actually a call to IHTMLInputElement->get_name. But it was a lot of work to get here! To make analysis quicker, I wrote a set of Perl scripts to:
  • Extract GUIDs and corresponding interface names from the registry
  • Scan a PE file for binary-encoded GUIDs and produce labels with offsets
  • Search Windows header files and list Vtbl methods in order
These files can be downloaded from http://www.openrce.org/articles/files/oleretools.zip.

Incorporating this functionality into your favorite disassembler is left as an exercise to the reader.

Article Comments Write Comment / View Complete Comments

    Username Comment Excerpt Date
  Donner2011 Hello I am so delighted I found your site, I re... Wednesday, December 21 2011 04:07.33 CST
  droption Perfect work Friday, June 23 2006 00:35.52 CDT
hoglund Good article, thanks! -Greg Monday, October 10 2005 01:13.27 CDT
nikolatesla20 Bout time someone focused on COM. It's not that... Wednesday, September 28 2005 09:26.52 CDT
anonymouse since smidgeonsoft mentioned tlb and such i r... Thursday, September 15 2005 11:37.10 CDT
smidgeonsoft Very nice writeup. I offer the following to ... Thursday, September 15 2005 08:00.05 CDT

There are 31,302 total registered users.

Recently Created Topics
[help] Unpacking VMP...
Reverse Engineering ...
let 'IDAPython' impo...
set 'IDAPython' as t...
GuessType return une...
About retrieving the...
How to find specific...
How to get data depe...
Identify RVA data in...

Recent Forum Posts
Finding the procedur...
Question about debbu...
Identify RVA data in...
let 'IDAPython' impo...
How to find specific...
Problem with ollydbg
How can I write olly...
New LoadMAP plugin v...
Intel pin in loaded ...
OOP_RE tool available?

Recent Blog Entries
Breaking IonCUBE VM

Anatomy of a code tracer

IAT Patcher - new tool for ...

CryptoShark: code tracer ba...

Build a debugger in 5 minutes

More ...

Recent Blog Comments
nieo on:
IAT Patcher - new tool for ...

djnemo on:
Kernel debugger vs user mod...

acel on:
Kernel debugger vs user mod...

pedram on:
frida.github.io: scriptable...

capadleman on:
Using NtCreateThreadEx for ...

More ...

SoySauce Blueprint
Jun 6, 2008

[+] expand

View Gallery (11) / Submit