IDA Live Analysis Markup
Cody Pierce (codypierce) <cpiercetippingpointcom> Wednesday, November 1 2006 15:07.42 CST


Often times when reversing a binary I have key elements that I do not know.  This is an inherent problem, as anyone knows, with static analysis.  Dynamic data elements that are resolved at runtime often hinder progress when understanding a piece of code.  So I threw together a quick tagging method so that I can easily resolve this info through live analysis.  To do this is two parts, tagging the data you want in IDA, and recording the data from the live process.  I utilize IDA Python for part I and the Paimei module "pydbg" for part II...you could always do part I in pure idc if you hate IDA Python for some reason.

Part I:

I utilize the free form comments of an instruction in IDA to stuff my tag.  Each tag begins with "**LA" which stands for live analysis if its not clear :).  After that you can pull three types of information by using the "type" and associated "value" seperated by a ':'.  The below lists these types, and types can be arbitrarily strung together.


Type    Values             Comment
----    -------            --------
'O'     0,1,2              Operand: Enumerates the operand specified and retrieves its value
'R'     <Register string>  Register: Displays the contents of the specified register, accepts dword, word, byte representations
'M'     <Memory address>   Memory:   Dereferences and displays the contents of the memory address as a dword


Here is an example of the tagging in notepad.exe


.text:010073A4 008    call    __SEH_prolog                  ; int
.text:010073A4
.text:010073A9 084    xor     ebx, ebx                      ; **LA R:ebx
.text:010073AB 084    push    ebx                           ; lpModuleName **LA O:0
.text:010073AC 088    mov     edi, ds:GetModuleHandleA(x)
.text:010073B2 088    call    edi ; GetModuleHandleA(x)
.text:010073B4 084    cmp     word ptr [eax], 5A4Dh         ; **LA O:0, R:eax
.text:010073B9 084    jnz     short loc_10073DA             ; **LA M:77e7000
.text:010073B9
.text:010073BB 084    mov     ecx, [eax+3Ch]                ; **LA R:ecx,R:ebx,R:esi,O:1,M:77e7012e
.text:010073BE 084    add     ecx, eax
.text:010073C0 084    cmp     dword ptr [ecx], 4550h        ; **LA O:0
.text:010073C6 084    jnz     short loc_10073DA
.text:010073C6
.text:010073C8 084    movzx   eax, word ptr [ecx+18h]       ; **LA O:1
.text:010073CC 084    cmp     eax, 10Bh                     ; **LA R:eax, R:ax, R:dh
.text:010073D1 084    jz      short loc_10073F2


As you can see the tags are ',' seperated and values ':' seperated.  Also existing comments can stay the same adding the tag anywhere you like.

Once you have tagged the idb appropriately its necessary to parse the idb into something the live part can handle.  I chose to write the IDA python script in such a way that it exports to a ',' delimited text file so that you can easily copy this to any host and run the live portion.  The script, once ran, will ask for a destination file name and output the parsed tags to that file along with the IDA message window.  An example of our above output is below.


10073a9,r,4,EBX
10073ab,r,4,1
10073b4,p,4,1
10073b4,r,4,EAX
10073b9,p,4,77e70000
10073bb,r,4,ECX
10073bb,r,4,EBX
10073bb,r,4,ESI
10073bb,o,4,2
10073bb,p,4,77e7012e
10073c0,p,4,1
10073c8,o,4,2
10073cc,r,4,EAX
10073cc,r,2,AX
10073cc,r,1,DH


As you can see the output is somewhat similar to the "type" tags in IDA.  There are also a couple of other fields that get auto discovered by the IDA Python script like size.  The fields are as follows.


Address,Type,Size,Type Data

Address: The address of the tag, and where we will break during execution
Type: Slightly different than the comment tags and are as follows

  'r' Register
  'p' Pointer (Memory location)
  'o' Offset

Size: Size of the requested data...currently limited to 1,2, and 4 bytes
Type Data: The needed information about what is requested


Once that has been generated it can be copied to the destination host and is ready for use in the live piece.

Part II:

The live piece of this utilizes the Paimei module "pydbg" to set breakpoints on our listed addresses and dereference data.  It operates by reading in the list, setting break points on the proper code section addresses, and then handling those exceptions.  Most of the "types" are straight forward except operands ('O').  When an operand is requested it pulls the needed information from the disassembly (using pydasm) and works on it accordingly.  One slight difference is when requesting the first operand (destination) it has to wait until after the instruction has completed in order to get the proper data.  This is handled by delaying those request until all others have finished, and then setting a single step handler up and setting the process into that mode.  The single step handler then does the same as all other operands restoring the previous single step state after the fact.  The command line to this piece is as follows.


live_analysis.py <process name> <live analysis config file>

C:\Code\Python\live_analysis>live_analysis.py c:\windows\notepad.exe la.conf


The live analysis script will first look to see if the process is running and attach, if the process is not running it will then do a load based on the path you specify.  This allows easy analysis of services and critical processes.

Once attached/loaded the script will set the appropriate break points and handlers.  Some output of the script is below.


C:\Code\Python\live_analysis>live_analysis.py c:\windows\notepad.exe la.conf
[*] Trying to attach to existing notepad.exe
[*] Trying to load c:\windows\notepad.exe
[*] Setting bp @ 0x010073a9
[*] Setting bp @ 0x010073ab
[*] Setting bp @ 0x010073b4
[*] Setting bp @ 0x010073b9
[*] Setting bp @ 0x010073bb
[*] Setting bp @ 0x010073c0
[*] Setting bp @ 0x010073c8
[*] Setting bp @ 0x010073cc
[*] Setting bp @ 0x0100752d
[*] Setting bp @ 0x0100752f
[*] Setting bp @ 0x01007531
[*] Setting bp @ 0x01007534
[*] Setting bp @ 0x01007535
[*] Setting bp @ 0x0100753b
[*] Setting bp @ 0x0100753c
[*] Setting bp @ 0x0100753e
[*] Setting bp @ 0x01007541
[*] Setting bp @ 0x01007544


As expected when those break points get hit, the requested data is recorded and output.  This is my one piece of contention about this script.  The output could be better utilized in another format, which I have yet to decide on.  Some suggestions are loading into a database, or importing back into IDA.  I did write a cheap hack to generate an idc from the output but it is not tested or well designed.  The resulting output is below.


[*] 0x010073a9      EBX [Reg    ] is 0x7ffd8000 [4]
[*] 0x010073ab        1 [Reg    ] is 0x0        [4]
[*] 0x010073b4      EAX [Reg    ] is 0x1000000  [4]
[*] 0x010073b4        1 [Pointer] is 0x905a4d   [4]
[*] 0x010073b9 77e70000 [Pointer] is 0x905a4d   [4]
[*] 0x010073bb      ECX [Reg    ] is 0x7ffb0    [4]
[*] 0x010073bb      EBX [Reg    ] is 0x0        [4]
[*] 0x010073bb      ESI [Reg    ] is 0x1e06380f [4]
[*] 0x010073bb        2 [Offset ] is 0xe0       [4]
[*] 0x010073bb 77e7012e [Pointer] is 0x40001    [4]
[*] 0x010073c0        1 [Pointer] is 0x4550     [4]
[*] 0x010073c8        2 [Offset ] is 0xa07010b  [4]
[*] 0x010073cc      EAX [Reg    ] is 0x10b      [4]
[*] 0x010073cc       AX [Reg    ] is 0x10b      [2]
[*] 0x010073cc       DH [Reg    ] is 0xeb       [1]


As you can see all of the tagged data has been captured and displayed for your reversing needs.

To wrap it up this is pretty handy when filling in some dynamically resolved pieces of a binary you may be statically reversing.  The combination of the two (static and live) is something I try and make real as often as possible so that I can save myself from going between IDA and WinDbg or something similar (Although it could be said I now have to go between IDA and the LA stuff).  However I find this much faster and easier on the eyes (the eyes part is also debatable :)).  As stated above im not in love with the output format, and will eventually find something that fits better.  In the future I would also like to expand this into a comprehensive IML (IDA Markup Language) where you can further bring the static/live methods together.  Anyways email/message me if you have some suggestions or improvements.


gen_la_config.py  IDA Python script that generates the needed live analysis config file.
live_analysis.py  Live analysis command line script which records the appropriate data.
Paimei  Paimei which is needed (actually only pydbg is needed) for the live analysis portion.

Comments
Posted: Wednesday, December 31 1969 18:00.00 CST