OpenRCE Blog Entry

Ero Carrera (ero) <ero

carrera

gmail

com>

Tuesday, May 15 2007 20:43.00 CDT

This in a slightly expanded example of what Halvar and I showed in the Sabre-Security trainings last October. With it I illustrate how to, in BinNavi, build a small dataflow analyzer in Python so that it can do some work for us.

Lets imagine that theres a basic block like the following, which is part of the ROT13 routine in the Mydoom worm.

Ideally we would like to have such tool what would spare us from having to figure out what the assembly code in that block does, regardless of how simple it might be in this case. Such functionality would make analysis much less tedious and less error prone.

Given that BinNavi offers an interactive Python interpreter with access to every single function, basic block, instruction, operand and expressions within an operand, this is a problem that we could try to address with some scripting.

The dataflow reconstruction

Assembly instructions move data around and modify it by performing different operations on it. Values are fetched from registers, memory or specified explicitly and the results are stored back in registers or memory.

So, if we would like to be able to transform the operations into something more readable we could track all the assignments and operations and compose some sort of combined expression. Thats actually much easier than it might sound like.

Taking advantage of Pythons dictionaries (aka hashes, aka associative arrays, aka maps) we could proceed as follows.

We process individually each instruction. As each has different semantics, we have to deal with them individually (some groups of similar instructions can be handled somehow generically tho.).

First the instruction might push or pop something from the stack and we could emulate that behaviour with a simple list.

Second the instruction might get data from a source register/memory/immediate and, upon operating on it, assign it to a destination operand. We will just consider these two cases for the time being, this is only an exercise after all ;)

In this case we could store in our dictionary the source item using as key the destination register or memory location. That would allow us to keep the current state of the registers and memory locations. For instance: lea eax, [edi+2]

The destination of the lea operation is eax the source/result is [edi+2]. Hence we could have an entry in our dictionary that maps eax to edi+2, there are no brackets here because lea will store the value of the addition, not its contents, so we dont need the brackets (which stand for memory-dereference)

If a register is accessed later on by another instruction and we know where the value comes from (because we stored the previous assignment operation in our dictionary) we could compose it together, having a combined expression that tracks all data. Following the example, imagine the instruction mov bl, [eax+esi], that will put into bl the value pointed to by [eax+esi], in our dictionary we would map bl to have the value [eax+esi]

But, if we would check if we have any of the expression parts already in our dictionary (remember that eax maps to edi+2) then we can expand [eax+esi] into [edi+2+esi]. One might be able to infer now that this procedure can be taken further, composing longer expressions that would reduce the amount and complexity of the assembly code to look at.

As another illustrative example, one could assign the the AL key in the dictionary the nested lists representing the expression tree for the memory reference as seen in the following figure.

As just seen, operands are treated as graphs by BinNavi. That allows for a versatile manipulation of their contents. We can transform the tree to nested lists and operate on those.

The form of the nested lists is

root

child_1

child_2

Where each children has the same form

For instance, the following two operand trees

would take the following from as nested lists:

[b4, [ecx]]

[b4, [ss, [[, [+, [ebp], [4294967240]]]]]

In short, in order to do basic reconstruction of the data flow the following operations need to be performed:

Keep track of all assigned values using a dictionary

Substituting leafs with values already in our dictionary

Perform substitutions and assign SRCs to DSTs hence updating the dictionary to the last state

In a following post, Ill show an implementation using BinNavis embedded Python interpreter.

	Posted: Wednesday, December 31 1969 18:00.00 CST