|
IDA Live Analysis Markup
Often times when reversing a binary I have key elements that I do not know. This is an inherent problem, as anyone knows, with static analysis. Dynamic data elements that are resolved at runtime often hinder progress when understanding a piece of code. So I threw together a quick tagging method so that I can easily resolve this info through live analysis. To do this is two parts, tagging the data you want in IDA, and recording the data from the live process. I utilize IDA Python for part I and the Paimei module "pydbg" for part II...you could always do part I in pure idc if you hate IDA Python for some reason. Part I: I utilize the free form comments of an instruction in IDA to stuff my tag. Each tag begins with "**LA" which stands for live analysis if its not clear :). After that you can pull three types of information by using the "type" and associated "value" seperated by a ':'. The below lists these types, and types can be arbitrarily strung together. Here is an example of the tagging in notepad.exe As you can see the tags are ',' seperated and values ':' seperated. Also existing comments can stay the same adding the tag anywhere you like. Once you have tagged the idb appropriately its necessary to parse the idb into something the live part can handle. I chose to write the IDA python script in such a way that it exports to a ',' delimited text file so that you can easily copy this to any host and run the live portion. The script, once ran, will ask for a destination file name and output the parsed tags to that file along with the IDA message window. An example of our above output is below. As you can see the output is somewhat similar to the "type" tags in IDA. There are also a couple of other fields that get auto discovered by the IDA Python script like size. The fields are as follows. Once that has been generated it can be copied to the destination host and is ready for use in the live piece. Part II: The live piece of this utilizes the Paimei module "pydbg" to set breakpoints on our listed addresses and dereference data. It operates by reading in the list, setting break points on the proper code section addresses, and then handling those exceptions. Most of the "types" are straight forward except operands ('O'). When an operand is requested it pulls the needed information from the disassembly (using pydasm) and works on it accordingly. One slight difference is when requesting the first operand (destination) it has to wait until after the instruction has completed in order to get the proper data. This is handled by delaying those request until all others have finished, and then setting a single step handler up and setting the process into that mode. The single step handler then does the same as all other operands restoring the previous single step state after the fact. The command line to this piece is as follows. The live analysis script will first look to see if the process is running and attach, if the process is not running it will then do a load based on the path you specify. This allows easy analysis of services and critical processes. Once attached/loaded the script will set the appropriate break points and handlers. Some output of the script is below. As expected when those break points get hit, the requested data is recorded and output. This is my one piece of contention about this script. The output could be better utilized in another format, which I have yet to decide on. Some suggestions are loading into a database, or importing back into IDA. I did write a cheap hack to generate an idc from the output but it is not tested or well designed. The resulting output is below. As you can see all of the tagged data has been captured and displayed for your reversing needs. To wrap it up this is pretty handy when filling in some dynamically resolved pieces of a binary you may be statically reversing. The combination of the two (static and live) is something I try and make real as often as possible so that I can save myself from going between IDA and WinDbg or something similar (Although it could be said I now have to go between IDA and the LA stuff). However I find this much faster and easier on the eyes (the eyes part is also debatable :)). As stated above im not in love with the output format, and will eventually find something that fits better. In the future I would also like to expand this into a comprehensive IML (IDA Markup Language) where you can further bring the static/live methods together. Anyways email/message me if you have some suggestions or improvements. gen_la_config.py IDA Python script that generates the needed live analysis config file. live_analysis.py Live analysis command line script which records the appropriate data. Paimei Paimei which is needed (actually only pydbg is needed) for the live analysis portion. Comments
| ||||||