Introducing pynary
c1de0x <c1de0xgmailcom> Tuesday, December 11 2007 12:42.58 CST


Hey all,

I've posted some very very early code for pynary at http://openrce-snippets.googlecode.com/svn/trunk/standalone/pynary/.

For those of you who haven't been subjected to my speculation and badgering on the irc channel, I'll describe what this is and what it could, hopefully, become.

It all started when I decided that IDA's flirt wasn't doing a good enough job of matching standard library functions - missing many 'trivial to match' functions, incorrectly matching others, ignoring certain functions entirely, forcing the user to pick between 'clashing' function signatures, etc.

I started thinking about the particular issues related to the problem of matching library functions and how each of FLIRT's limitations could be overcome, and eventually decided that what the RCE community needs is a flexible and extensible binary function matching library.

So I started working on a script - in python of course - and it quickly became clear that I could do so much more with the infrastructure I was laying down. I decided to expand my horions a bit, and renamed the project ('gensig') to pynary.

pynary will hopefully become a powerful framework for binary code analysis.

The initial goal is to finish implementation of the signature matching goal using graph isomorphism and an extensible 'write-your-own-heuristic' model to tweak matching for particular targets. I also intend to identify standard library global constants and structure where possible.

Once the initial goal is acheived, I look forward to implementing a number of cool features:
* stack frame analysis
* un-inliner
* exception handling parsing/analysis
* 'functionally equivalent' matching
* c++ template function matching
* meta-data transfer between IDBs
* c++ class reconstruction (with/without RTTI)
* ...

At the moment, the bulk of the work has been to add COFF object support to ero's pefile. This functionality is not yet merged into the mainstream release of pefile, but will hopefully make it in when stable enough.

So far, the pynary.py file opens a .LIB file, enumerates its exported functions and generates a basic-block graph for each function using recursive traversal. These graphs in turn form a 'graph of graphs' of inter-function calls.

Calls to externally defined functions are not resolved, but in the next stage, when multiple .LIBs are loaded, externs will be resolved (sort of reverse-linking).

This graph can be traversed by matching algorithms.

Anyhow, feel free to contact me (email, irc, xmpp) with ideas and comments. Anyone who want's to get their hands dirty is more than welcome.

Comments
nicowow Posted: Wednesday, December 12 2007 07:29.55 CST
Hey c1de0x,
  We have been working on a similar function for Immunity Debugger, and obviously we walk through the same path (We code a half-finished libcoff.py) but as a stand-alone, not included on pelib. If you want, we can totally merge our stuff with yours.
  Check out our librecognition system on Libs/librecognition.py
inside immunity debugger.
  We are still working on improvement and more accuracy.

Cheers

c1de0x Posted: Wednesday, December 12 2007 08:53.12 CST
nicowow: It'd be great to merge what we can. Which version does it ship with?