Novel approach to binary analysis on UNIX
Cody Pierce (codypierce) <cpiercetippingpointcom> Thursday, October 26 2006 15:08.02 CDT


I had been thinking about the power of UNIX when processing data on the command line.  Find, grep, awk, etc. are all well designed for this purpose.  With that in mind I came up with a novel approach to representing a binary in unix.  Since all binaries can be effectively broken down into functions, basic blocks, and instructions I wrote a script that takes a Paimei pida file and writes it in a UNIX directory structure.  This allows for a user to navigate a binary with whatever UNIX environment they prefer using the powerful command line utilities.  The way in which the script works is it creates a directory for each function, and a directory for each basic block in the function containing a file for each instruction.  Each file/directory is named by address and the instruction files contain the disassembly of that instruction.  For instance


$ find -L .
./0x401000                    <--- Function
./0x401000/0x401000           <--- Basic Block
./0x401000/0x401000/0x401000  <--- Instructions
./0x401000/0x401000/0x401002  <
./0x401000/0x401000/0x401008  <
./0x401000/0x401000/0x401009  <
./0x401000/0x401000/0x40100e  <
./0x401000/0x401000/0x401010  <


You can also print the contents of a basic block or function like so.


$ for i in `ls`;do echo -n "$i: "; cat $i; echo; done
0x401050: push ebp
0x401051: mov ebp [esp+arg_0]
0x401055: push esi
0x401056: mov eax ebp
0x401058: push edi
0x401059: lea edx [eax+1]
0x40105c: lea esp [esp+0]


Also I create symlinks for any call inside of a basic block, linking it to the functions directory.  Like below, where a directory with an "_" is the call address and its destination function.


$ find -L . -type d
./0x402140/0x4023b9/0x4023c0_0x401050
./0x402140/0x4023b9/0x4023c0_0x401050/0x401050
./0x402140/0x4023b9/0x4023c0_0x401050/0x401060
./0x402140/0x4023b9/0x4023c0_0x401050/0x401067
./0x402140/0x4023b9/0x4023c0_0x401050/0x401070


Here is another simple example to look for interesting library calls.


$ find . -type f -exec egrep -H 'sprintf|sscanf|recv|bind|accept' {} \;
./0x401050/0x401094/0x4010aa:call ds:_imp__sprintf
./0x401700/0x401700/0x401716:call ds:_imp__sprintf
./0x404850/0x4048e6/0x4048f9:call ds:_imp__accept
./0x406090/0x4061d9/0x4061e6:call ds:_imp__sscanf
./0x406090/0x40624d/0x40625a:call ds:_imp__sscanf
./0x4062e0/0x406340/0x406353:call ds:_imp__sscanf
./0x406c90/0x406d64/0x406d7a:call ds:_imp__sprintf
./0x404060/0x4040a7/0x4040b1:call ds:_imp__recv
./0x40dd20/0x40dd62/0x40dd6b:call ds:_imp__bind


Kinda funny.  The script can be gotten from the link below, and it take a Paimei pida file and output directory.

https://www.openrce.org/repositories/users/codypierce/module2dir.py
Paimei

Comments
Posted: Wednesday, December 31 1969 18:00.00 CST