Data Flow Analysis - Dynamic/Static Taint Analysis
Amro Jex (Amrojex) <ereversegmailcom> Monday, August 9 2010 11:27.15 CDT


I am currently building a tool �ZeroTracer� that performs data flow analysis for user supplied inputs/tainted inputs. ZeroTracer propagates tainted bytes: memory-to-registers, registers-to-registers, registers-to-memory and memory-to-memory, using a combination of dynamic/static analysis. Dynamic analysis using Pin �a dynamic binary instrumentation tool� to collect a trace log for all memory read/write and registers changes. And static analysis that takes place on the logged trace to perform taint propagation analysis. The name ZeroTracer came from the lessons learned in building older versions of the tool �0xtracer�: Whenever performing taint propagation someone should focus on the smallest unit of data being processed, which I wish would be a BIT but unfortunately it is so complicated to propagate BITs rather than BYTEs. However, its better to use the byte unit cause in most register/memory propagation cases the smallest unit is one byte, for example: MOV to �al, ah, cl�. But thats not the case when it comes to setting the flags in the eflags register, or when rotating or shifting a register. Also its easier to address tainted memory bytes rather than bits. Imagine a tainted byte at address [0x11122388], if we wanna taint it based on bits we have to address it something like [0x11122388(.0)(.1)(.2)(.3)(.4)(.5)(.6)(.7)].

Before getting into details about ZeroTracer�s design, implementation, and features which am planning to do in future posts, lets take an overview about its older version �0xtracer� which is a PaiMei module that relies on pydbg. In 0xtracer I�ve been using memory breakpoints heap/stack to spot any code blocks touching the tainted memory bytes, then perform static taint propagation analysis using pydasm to analyze the basic block that is touching the tainted memory. 0xtracer been using different techniques to tune the propagation analysis. Here is a list of all the techniques I�ve been using in 0xtracer:

1- Memory Break Points �Page Guard/NoAccess�.
2- PyEmu �to emulate registers read/write�.
3- Harware Break Walking �A technique that I invented to taint/propagate a single byte in a single dynamic execution trace�.
4- Code Block(s) Signature �The tool learns from the outputs of the above methods and fingerprints propagation patterns for future use�.

0xtracer first consults its knowledge database to see if the basic code touching the tainted memory is defined in its knowledge, if not, it performs static taint propagation analysis using pyDasm, then emulated dynamic analysis useing pyEmu. Once it recognizes a new pattern it adds it to its knowledge, etc.

Comments
frozenrain Posted: Monday, August 9 2010 19:49.30 CDT
good��Attention��

tosanjay Posted: Tuesday, August 10 2010 03:30.54 CDT
This sounds very interesting. A question about your choice of using PIN: you want to use instrumentation only for knowing read/write operations. This is also possible with Pydbg by inserting BP. Of course, you compromise on speed. Apart from that, is there any other reason for using Pin?
Another thing, I could not get any references to 0xtracer. Can you please mention where can I find it? I am too trying to do data-flow analysis on binaries and looking for tools.
thanks

Amrojex Posted: Tuesday, August 10 2010 11:42.05 CDT
0xtracer is not released yet, and I doubt if I will ever release it. But am planning to release parts of its newer version "ZeroTracer".

There are a lot of advantages in using dynamic binary instrumentation tools rather than using debuggers. It is not only Pin's speed that motivated me to rewrite everything from scratch to use Pin instead of pydbg. However, With Pin I've been able to instrument and log around 10,000,000 instructions in just 3 minutes!, such result is impossible when relaying on debuggers.

In Pin's instrumentation am logging the following info:

0- Thread ID
1- Module
2- IP
3- Instruction Disassembly
4- Instruction Category
5- Opcode
6- Number of Operands
7- Operand1, Operand2, Operand3
8- Memory Access type "R,W,X,RW,WR,RR"
9- Number of Read Registers
10- Read Registers
11- Number of Written Registers
12- Written Registers

And in the analysis routines am logging the following:

1- Memory Read Address / Read Size / Read Value
2- Memory Written Address / Size / Written Value
3- Written Register / Value Written

Before using Pin, I had to prepare a lot of analysis routines in 0xtracer "using pydbg, pydasm, pyemu" to give me the same results listed above. With Pin's rich API, my code been reduced almost 80%, giving the same results and with no false positives.

dyjakan Posted: Tuesday, August 10 2010 15:34.27 CDT
What motivated you to choose PIN over DynamoRIO? I ask because I'm going to start little project and I'm standing before decision PIN or DynamoRIO.

Amrojex Posted: Thursday, August 12 2010 12:48.05 CDT
I Choosed Pin for its cross platform support.

archies50 Posted: Wednesday, October 31 2012 19:05.48 CDT
can you give the details of zero tracer ! especially the part where you taint eflag register.