📚 OpenRCE is preserved as a read-only archive. Launched at RECon Montreal in 2005. Registration and posting are disabled.








Flag: Tornado! Hurricane!

Blogs >> codypierce's Blog

Created: Wednesday, November 1 2006 15:07.42 CST Modified: Wednesday, November 1 2006 15:24.01 CST
Printer Friendly ...
IDA Live Analysis Markup
Author: codypierce # Views: 2807

Often times when reversing a binary I have key elements that I do not know.  This is an inherent problem, as anyone knows, with static analysis.  Dynamic data elements that are resolved at runtime often hinder progress when understanding a piece of code.  So I threw together a quick tagging method so that I can easily resolve this info through live analysis.  To do this is two parts, tagging the data you want in IDA, and recording the data from the live process.  I utilize IDA Python for part I and the Paimei module "pydbg" for part II...you could always do part I in pure idc if you hate IDA Python for some reason.

Part I:

I utilize the free form comments of an instruction in IDA to stuff my tag.  Each tag begins with "**LA" which stands for live analysis if its not clear :).  After that you can pull three types of information by using the "type" and associated "value" seperated by a ':'.  The below lists these types, and types can be arbitrarily strung together.


Type    Values             Comment
----    -------            --------
'O'     0,1,2              Operand: Enumerates the operand specified and retrieves its value
'R'     <Register string>  Register: Displays the contents of the specified register, accepts dword, word, byte representations
'M'     <Memory address>   Memory:   Dereferences and displays the contents of the memory address as a dword


Here is an example of the tagging in notepad.exe


.text:010073A4 008    call    __SEH_prolog                  ; int
.text:010073A4
.text:010073A9 084    xor     ebx, ebx                      ; **LA R:ebx
.text:010073AB 084    push    ebx                           ; lpModuleName **LA O:0
.text:010073AC 088    mov     edi, ds:GetModuleHandleA(x)
.text:010073B2 088    call    edi ; GetModuleHandleA(x)
.text:010073B4 084    cmp     word ptr [eax], 5A4Dh         ; **LA O:0, R:eax
.text:010073B9 084    jnz     short loc_10073DA             ; **LA M:77e7000
.text:010073B9
.text:010073BB 084    mov     ecx, [eax+3Ch]                ; **LA R:ecx,R:ebx,R:esi,O:1,M:77e7012e
.text:010073BE 084    add     ecx, eax
.text:010073C0 084    cmp     dword ptr [ecx], 4550h        ; **LA O:0
.text:010073C6 084    jnz     short loc_10073DA
.text:010073C6
.text:010073C8 084    movzx   eax, word ptr [ecx+18h]       ; **LA O:1
.text:010073CC 084    cmp     eax, 10Bh                     ; **LA R:eax, R:ax, R:dh
.text:010073D1 084    jz      short loc_10073F2


As you can see the tags are ',' seperated and values ':' seperated.  Also existing comments can stay the same adding the tag anywhere you like.

Once you have tagged the idb appropriately its necessary to parse the idb into something the live part can handle.  I chose to write the IDA python script in such a way that it exports to a ',' delimited text file so that you can easily copy this to any host and run the live portion.  The script, once ran, will ask for a destination file name and output the parsed tags to that file along with the IDA message window.  An example of our above output is below.


10073a9,r,4,EBX
10073ab,r,4,1
10073b4,p,4,1
10073b4,r,4,EAX
10073b9,p,4,77e70000
10073bb,r,4,ECX
10073bb,r,4,EBX
10073bb,r,4,ESI
10073bb,o,4,2
10073bb,p,4,77e7012e
10073c0,p,4,1
10073c8,o,4,2
10073cc,r,4,EAX
10073cc,r,2,AX
10073cc,r,1,DH


As you can see the output is somewhat similar to the "type" tags in IDA.  There are also a couple of other fields that get auto discovered by the IDA Python script like size.  The fields are as follows.


Address,Type,Size,Type Data

Address: The address of the tag, and where we will break during execution
Type: Slightly different than the comment tags and are as follows

  'r' Register
  'p' Pointer (Memory location)
  'o' Offset

Size: Size of the requested data...currently limited to 1,2, and 4 bytes
Type Data: The needed information about what is requested


Once that has been generated it can be copied to the destination host and is ready for use in the live piece.

Part II:

The live piece of this utilizes the Paimei module "pydbg" to set breakpoints on our listed addresses and dereference data.  It operates by reading in the list, setting break points on the proper code section addresses, and then handling those exceptions.  Most of the "types" are straight forward except operands ('O').  When an operand is requested it pulls the needed information from the disassembly (using pydasm) and works on it accordingly.  One slight difference is when requesting the first operand (destination) it has to wait until after the instruction has completed in order to get the proper data.  This is handled by delaying those request until all others have finished, and then setting a single step handler up and setting the process into that mode.  The single step handler then does the same as all other operands restoring the previous single step state after the fact.  The command line to this piece is as follows.


live_analysis.py <process name> <live analysis config file>

C:\Code\Python\live_analysis>live_analysis.py c:\windows\notepad.exe la.conf


The live analysis script will first look to see if the process is running and attach, if the process is not running it will then do a load based on the path you specify.  This allows easy analysis of services and critical processes.

Once attached/loaded the script will set the appropriate break points and handlers.  Some output of the script is below.


C:\Code\Python\live_analysis>live_analysis.py c:\windows\notepad.exe la.conf
[*] Trying to attach to existing notepad.exe
[*] Trying to load c:\windows\notepad.exe
[*] Setting bp @ 0x010073a9
[*] Setting bp @ 0x010073ab
[*] Setting bp @ 0x010073b4
[*] Setting bp @ 0x010073b9
[*] Setting bp @ 0x010073bb
[*] Setting bp @ 0x010073c0
[*] Setting bp @ 0x010073c8
[*] Setting bp @ 0x010073cc
[*] Setting bp @ 0x0100752d
[*] Setting bp @ 0x0100752f
[*] Setting bp @ 0x01007531
[*] Setting bp @ 0x01007534
[*] Setting bp @ 0x01007535
[*] Setting bp @ 0x0100753b
[*] Setting bp @ 0x0100753c
[*] Setting bp @ 0x0100753e
[*] Setting bp @ 0x01007541
[*] Setting bp @ 0x01007544


As expected when those break points get hit, the requested data is recorded and output.  This is my one piece of contention about this script.  The output could be better utilized in another format, which I have yet to decide on.  Some suggestions are loading into a database, or importing back into IDA.  I did write a cheap hack to generate an idc from the output but it is not tested or well designed.  The resulting output is below.


[*] 0x010073a9      EBX [Reg    ] is 0x7ffd8000 [4]
[*] 0x010073ab        1 [Reg    ] is 0x0        [4]
[*] 0x010073b4      EAX [Reg    ] is 0x1000000  [4]
[*] 0x010073b4        1 [Pointer] is 0x905a4d   [4]
[*] 0x010073b9 77e70000 [Pointer] is 0x905a4d   [4]
[*] 0x010073bb      ECX [Reg    ] is 0x7ffb0    [4]
[*] 0x010073bb      EBX [Reg    ] is 0x0        [4]
[*] 0x010073bb      ESI [Reg    ] is 0x1e06380f [4]
[*] 0x010073bb        2 [Offset ] is 0xe0       [4]
[*] 0x010073bb 77e7012e [Pointer] is 0x40001    [4]
[*] 0x010073c0        1 [Pointer] is 0x4550     [4]
[*] 0x010073c8        2 [Offset ] is 0xa07010b  [4]
[*] 0x010073cc      EAX [Reg    ] is 0x10b      [4]
[*] 0x010073cc       AX [Reg    ] is 0x10b      [2]
[*] 0x010073cc       DH [Reg    ] is 0xeb       [1]


As you can see all of the tagged data has been captured and displayed for your reversing needs.

To wrap it up this is pretty handy when filling in some dynamically resolved pieces of a binary you may be statically reversing.  The combination of the two (static and live) is something I try and make real as often as possible so that I can save myself from going between IDA and WinDbg or something similar (Although it could be said I now have to go between IDA and the LA stuff).  However I find this much faster and easier on the eyes (the eyes part is also debatable :)).  As stated above im not in love with the output format, and will eventually find something that fits better.  In the future I would also like to expand this into a comprehensive IML (IDA Markup Language) where you can further bring the static/live methods together.  Anyways email/message me if you have some suggestions or improvements.


gen_la_config.py  IDA Python script that generates the needed live analysis config file.
live_analysis.py  Live analysis command line script which records the appropriate data.
Paimei  Paimei which is needed (actually only pydbg is needed) for the live analysis portion.




Add New Comment
Comment:









There are 31,328 total registered users.


Recently Created Topics
[help] Unpacking VMP...
Mar/12
Reverse Engineering ...
Jul/06
let 'IDAPython' impo...
Sep/24
set 'IDAPython' as t...
Sep/24
GuessType return une...
Sep/20
About retrieving the...
Sep/07
How to find specific...
Aug/15
How to get data depe...
Jul/07
Identify RVA data in...
May/06
Question about memor...
Dec/12


Recent Forum Posts
Finding the procedur...
rolEYder
Question about debbu...
rolEYder
Identify RVA data in...
sohlow
let 'IDAPython' impo...
sohlow
How to find specific...
hackgreti
Problem with ollydbg
sh3dow
How can I write olly...
sh3dow
New LoadMAP plugin v...
mefisto...
Intel pin in loaded ...
djnemo
OOP_RE tool available?
Bl4ckm4n


Recent Blog Entries
halsten
Mar/14
Breaking IonCUBE VM

oleavr
Oct/24
Anatomy of a code tracer

hasherezade
Sep/24
IAT Patcher - new tool for ...

oleavr
Aug/27
CryptoShark: code tracer ba...

oleavr
Jun/25
Build a debugger in 5 minutes

More ...


Recent Blog Comments
nieo on:
Mar/22
IAT Patcher - new tool for ...

djnemo on:
Nov/17
Kernel debugger vs user mod...

acel on:
Nov/14
Kernel debugger vs user mod...

pedram on:
Dec/21
frida.github.io: scriptable...

capadleman on:
Jun/19
Using NtCreateThreadEx for ...

More ...


Imagery
SoySauce Blueprint
Jun 6, 2008

[+] expand

View Gallery (11) / Submit