Flag: Tornado! Hurricane!

Blogs >> RolfRolles's Blog

Created: Sunday, January 22 2012 08:14.27 CST Modified: Monday, January 23 2012 03:22.24 CST
Direct Link, View / Make / Edit Comments
Finding Bugs in VMs with a Theorem Prover, Round 1
Author: RolfRolles # Views: 6387

Here's some work I did last fall.

Equivalence checking is a popular technique in the academic literature and in industrial practice, used to verify that two "things" are equivalent.  Commonly, it involves translating those things into some logic that is supported by a theorem prover, and asking the decision procedure to prove whether the two things must always be equal.  Equivalently, we ask whether there exists any circumstances where the two things might not be equal.  If the two things must always be equal under all circumstances, then a complete decision procedure will eventually report that the formula is unsatisfiable.  If they can ever deviate, then it will return a "counterexample" illustrating a scenario in which the things differ.  The decision procedure may not terminate in a reasonable amount of time, or do so at all if the logic is undecidable or the decision procedure is incomplete.

Equivalence checking is commonly found in the hardware world (particularly, in a field known as Electronic Design Automation, or EDA), but it has been applied to software as well, for tasks such as verification that compiler optimizations have been implemented correctly, verification of cryptographic implementations, and other forms of specification-conformance.  In this blog, we'll take a look at an application of equivalence checking in deobfuscation.

Equivalence checking
--------------------

We begin by giving an example proving the equivalence between an x86 sequence and a machine instruction.

Bit-level hackers -- those who study the works of the original MIT "hackers", whose work bequeathed the term "hacker" -- will know the following example regarding taking the population count of a 32-bit integer.  The "population count" of a given integer is the number of one-bits set in the integer.  This operation is known as "popcnt", and the x86 instruction set features an instruction of the same name, whose syntax is "popcnt r32, r/m32" and whose semantics dictate that the population count of the right-hand side (32- or 16-bit, depending upon encoding) be deposited into the location specified by the left-hand side.  We can easily imagine implementing this functionality with a loop:


int popcnt = 0;
for(int mask = 1; mask; popcnt += !!(mask & value_to_compute_popcnt_of), mask <<= 1);


A hoary MIT HAKMEM gem shows how to compute the same value via a series of arithmetic manipulations and no explicit loop.  We summarize the procedure below.

Consider some 8-bit integer, whose bits we shall represent symbolically with letters as in the following:


abcdefgh


Let's demonstrate how the method computes the population count of this integer.  First, we separate out all of the even bits and the odd bits:


x = ((x & 0xAA) >> 1) + (x 0x55);

abcdefgh AND
10101010
--------
a0c0e0g0 SHR 1
--------
0a0c0e0g


abcdefgh AND
01010101
--------
0b0d0f0h


Adding these two quantities together, we obtain the following result:


wwxxyyzz


Where ww, xx, yy, and zz are each two-bit quantities such that ww = a + b, xx = c + d, yy = e + f, and zz = g + h.  By adding these masked-and-shifted quantities together, we thereby compute the sum of two adjacent bits, and store the sum in that location within the integer that corresponds to those two bits.  Given that 0+0 = 0, 0+1 = 1, 1+0 = 1, and 1+1 = 10, the sum can never overflow, and is hence neatly contained in those two bits.

Next, we perform the same kind of trick to compute the sum of each two-bit quantity.  We use a similar shift-and-add tableau:


x = ((x & 0xCC) >> 2) + (x & 0x33);

wwxxyyzz AND
11001100
--------
ww00yy00 SHR 2
--------
00ww00yy

wwxxyyzz AND
00110011
--------
00xx00zz


Adding these two quantities together, we get:


iiiijjjj


Where iiii and jjjj are the 4-bit results such that iiii = ww + xx and jjjj = yy + zz.  Once again, this sum is safe from overflow because ww and xx are at most 2-bits, whereas iiii and jjjj are 4-bits.

We pull the same trick again to sum the quantities iiii and jjjj:


x = ((x & 0xF0) >> 4) + (x & 0x0F);

iiiijjjj AND
11110000
--------
iiii0000 SHR 4
--------
0000iiii

iiiijjjj AND
00001111
--------
0000jjjj


Adding these two quantifies together, we get kkkkkkkk, where kkkkkkkk = iiii + jjjj = ww + xx + yy + zz = a + b + c + d + e + f + g + h.  This corresponds to the definition of the population count of an 8-bit integer.  We can easily imagine extending the same procedure for 16-, 32-bit, 64-bit, and generally 2^n-bit quantities.  Below we display a fragment of x86 assembly language that computes the population count for a 32-bit integer.

Sequence (1) -- messy bit hack for eax = popcnt(ebx)

mov eax, ebx      
and eax, 55555555h
shr ebx, 1        
and ebx, 55555555h
add ebx, eax      
mov eax, ebx      
and eax, 33333333h
shr ebx, 2        
and ebx, 33333333h
add ebx, eax      
mov eax, ebx      
and eax, 0F0F0F0Fh
shr ebx, 4        
and ebx, 0F0F0F0Fh
add ebx, eax      
mov eax, ebx      
and eax, 00FF00FFh
shr ebx, 8        
and ebx, 00FF00FFh
add ebx, eax      
mov eax, ebx      
and eax, 0000FFFFh
shr ebx, 16      
and ebx, 0000FFFFh
add ebx, eax      
mov eax, ebx      


This is equivalent to the x86 instruction

Sequence (2) -- direct ISA computation of eax = popcnt(ebx)

popcnt eax, ebx


Except that the former example also modifies the flags in some certain way, and modifies the ebx register, whereas the popcnt instruction modifies the flags in a different way and does not modify ebx.

We would like to prove the equivalence of sequences (1) and (2) with respect to the relationship of the resulting eax register with the original ebx register.  We must not consider the resulting value of the ebx register or the flags, for in that case, the sequences will not be equivalent in general.  The code for performing this task in Pandemic follows.


(* Define the x86 sequences *)
let hack_sequence_x86   = (* declaration of sequence (1) as X86.x86instrpref variant types *)
let popcnt_sequence_x86 = (* declaration of sequence (2) as X86.x86instrpref variant types *)

(* This function converts each sequence to IR *)
let make_ir_sequence x86l = List.concat (List.map (X86ToIR.translate_instr 0l) x86l)

(* These bindings are the IR translations *)
let hack_sequence_ir   = make_ir_sequence   hack_sequence_x86
let popcnt_sequence_ir = make_ir_sequence popcnt_sequence_x86

(* These bindings are the results of converting the IR sequences to SSA form *)
let   hack_tbl,  hack_sequence_ir_ssa = IRSSA.bb_to_ssa_state_out X86ToIRUtil.num_reserved_vars   hack_sequence_ir
let popcnt_tbl,popcnt_sequence_ir_ssa = IRSSA.bb_to_ssa_state_out X86ToIRUtil.num_reserved_vars popcnt_sequence_ir

(* The postcondition says that sequence(1).eax != sequence(2).eax *)
let hack_popcnt_postcondition =
  IRUtil.mk_not
   (IRUtil.mk_eq
     (IRUtil.mk_evar (Hashtbl.find   hack_tbl X86ToIRUtil.vEax))
     (IRUtil.mk_evar (Hashtbl.find popcnt_tbl X86ToIRUtil.vEax)))

(* We query the theorem prover as to whether eax from the ends of the respective
   sequences are equal.  We do this by asking the theorem prover to generate an
   example where they are not equal, which it is unable to do (the formula is
   unsatisfiable), and therefore they are equal on all inputs. *)
let _ = IDA.msg "%s\n" (Z3SymbolicExecute.symbolic_execute (Z3SymbolicExecute.mk_context ()) (hack_sequence_ir_ssa@popcnt_sequence_ir_ssa) hack_popcnt_postcondition)


The theorem prover cranks for 10 seconds and returns that the formula is unsatisfiable, in other words, that it was not able to generate a counterexample where sequence(1).eax != sequence(2).eax.  Assuming the soundness of Z3 with respect to propositional logic over the bitvector and array theories of finite domain, we have just generated a mathematical proof that the sequences have the same effect on the eax register.  Z3 can be configured to output the proof explicitly so that it may be checked by other tools.

Equivalence checking for verification of deobfuscation results
--------------------------------------------------------------

I had written a deobfuscation procedure (e.g., a program of the type described here) for a commercial VM ("VM" meaning virtualization obfuscator in this context), and I wanted to know whether my deobfuscation was correct, since I felt like I might've taken some shortcuts along the way.  Given access to some obfuscating transformation O, and access to a deobfuscating transformation D, we want to know if D(O(p)) is a semantics-preserving translation for all programs p.  We'll settle for random testing:  generating a "random" executable according to some guidelines, obfuscating it, deobfuscating it, and comparing (by equivalence checking) the deobfuscated program against the original, pre-obfuscated version.  As it turns out, the translation D(O(p)) is not semantics-preserving because VM itself has many bugs that could affect the translation of the original x86 into the VM bytecode.  

The procedure in the last paragraph can provide us with straight-line sequences of code (or control flow graphs) such that one is the original, and the other is the obfuscated version with deobfuscation applied.  We can then apply equivalence checking to determine whether these sequences are semantically equivalent.

It turns out that they often are not.  For example, consider this obfuscated sequence (which has been mostly deobfuscated for the sake of easy illustration):


mov cx, word ptr [esp]
add esp, 2
ror dword ptr ss:[esp], cl
pushfd


And the corresponding deobfuscated sequence:


pop cx
ror dword ptr ss:[esp], cl
pushfd


The former example has a similar effect as the latter one, but it modifies the flags before the rol instruction executes.  The rol instruction does not change the flags if cl is zero, therefore, the two sequences are not semantically equivalent if cl = 0.  The theorem prover finds this bug, the same bug for ror word ptr and ror byte ptr, and the equivalent bugs for rcr, rcl, rol, shl, shr, and sar.  This is a total of 21 buggy VM handlers.

Similarly, obfuscation can also affect the value of the flags register prior to the execution of the inc and dec instructions.  Consider this deobfuscated sequence:


pop eax
dec dword ptr ss:[esp]
pushfd


inc and dec do not change the value of the carry flag, but add and sub do.  Therefore, obfuscation applied upon the vicinity of the pop eax instruction can destroy the value of the carry flag in these circumstances.  As there are three sizes that these instructions might be applied to, there are 6 bugs of this variety.

The theorem prover also found more subtle bugs.  Consider the following deobfuscated instruction sequence, where esi points to the VM bytecode.  


lodsd dword ptr ds:[esi]
sub eax, ebx
xor eax, 7134B21Ah
add eax, 2564E385h
xor ebx, eax
movzx ax, byte ptr ds:[eax]
push ax


The program's obfuscator would alter the behavior of this snippet by introducing writes to the stack that were not present in the original version, therefore the state of the stack after execution of the obfuscated and non-obfuscated sequences would differ.  The memory read on the second-to-last line might therefore target the area below the stack pointer, which would cause it to read a different value in the obfuscated and deobfuscated world.  This bug is unlikely to manifest itself in the obfuscation of a real-world program, although it demonstrates the ruthless efficacy of theorem proving towards this problem:  the ability to discover tricky situations like this one by considering all possible concrete behaviors of the individual snippets, and generate countermodels for inequal scenarios, no matter how far-fetched.

When taking these factors into consideration, I am able to say that my deobfuscator is correct modulo the bugs in the original obfuscator, which demonstrably number at least 33 (or about 20% of the total handlers).

Created: Saturday, July 9 2011 02:36.41 CDT  
Direct Link, View / Make / Edit Comments
Control Flow Deobfuscation via Abstract Interpretation
Author: RolfRolles # Views: 5307

I present some work that I did involving automatic deobfuscation of obfuscated control flow constructs with abstract interpretation.  Considering this image, the code is responsible for taking graphs like the one on the left (where most of the "conditional" branches actually only go in one direction and are only present to thwart static analysis) and converting them into graphs like the one on the right.

Much work on deobfuscation relies on pattern-matching at least to some extent; I have coded such tools myself.  I have some distaste for such methods, since they stop working when the patterns change (they are "syntactic").  I prefer to code my deobfuscation tools as generically ("semantically") as possible, such that they capture innate properties of the obfuscation method in question, rather than hard-coding individual instances of the obfuscation.

The slides present a technique based on abstract interpretation, a form of static program analysis, for deobfuscating control flow transfers.  I translate the x86 code into a different ("intermediate") language, and then perform an analysis based on three-valued logic over the translated code.  The end result is that certain classes of opaque predicates (conditional jumps that are either always taken or always not taken) are detected and resolved.  I have successfully used this technique to break several protections making use of similar obfuscation techniques.  

Although I invented and implemented these techniques independently, given the wealth of work in program analysis, it wouldn't surprise me to learn that the particular technique has been previously invented.  Proper references are appreciated.

Code is also included.  The source relies upon my Pandemic program analysis framework, which is not publicly available.  Hence, the code is for educational purposes only.  Nonetheless, I believe it is one of very few examples of publicly-available source code involving abstract interpretation on binaries.

PPTX presentation, OCaml source code.

Created: Thursday, June 10 2010 15:17.00 CDT  
Direct Link, View / Make / Edit Comments
PatchDiff2 Analysis and Decompilation
Author: RolfRolles # Views: 6920

Now that PatchDiff2 is open-source, I release my IDB and the initial decompilation that I did.  You can find it here.

I met the author, Nicolas Pouvesle, at RECon 2008, who told me of his upcoming plans to release this plugin.  He mentioned that the license would be friendly towards reverse engineering, so I asked him if I could decompile the plugin and release the source code.  We must have miscommunicated, because I could have sworn he gave me the green light.  

When it was released, I spent about three days reverse engineering and decompiling it, resulting in the workproduct linked above.  I had intended to spend an extra couple of days on the decompilation to make sure it was functionally equivalent, if not byte-perfect (which is always the ultimate goal).

After I finished the initial phase of the decompilation, I sent it over to Nicolas to solicit his feedback.  He informed me that releasing the source code would violate PatchDiff2's EULA.  Therefore, I abandoned the project.  As it stands, I never even tried to compile the source code, so I'm afraid it's not worth much beyond the mere curiosity.  Still, I'm releasing this hoping that somebody might find it interesting.  The IDB itself is very thorough, e.g. all structures are recovered.

Enjoy.

Created: Monday, March 8 2010 15:45.22 CST  
Direct Link, View / Make / Edit Comments
Compiler Optimizations for Reverse Engineers
Author: RolfRolles # Views: 15203

I've decided to release the part of my training material relating to compiler optimizations.  I created this back in January of 2007; for a while I was teaching classes very often, and so it made sense to keep it private.  Nowadays I only teach a few classes a year, and so this presentation is sort of languishing away on my hard drive, which is a shame since it's my favorite part.  I think people will enjoy reading it, so I have decided to make it public.

Since I made this in early 2007, it lacks some of GCC 4's optimizations, and from time to time I realize that I forgot an optimization or two.  But for the most part, it's fairly complete.  If you have any specific suggestions for optimizations that I missed, please either email me or respond via comment with a link to a binary exhibiting said optimization and the address at which I can find it.  Proper compiler-theoretic names for the optimizations, if applicable, are also appreciated.

Here it is.  Enjoy.

Created: Friday, December 25 2009 13:45.13 CST  
Direct Link, View / Make / Edit Comments
Code release: C-subset compiler in Objective Caml
Author: RolfRolles # Views: 7714

Here is the source code for a compiler that I wrote in Objective Caml this semester, for a subset of the C language.  It requires a standalone C->IR translator which is not included in this release, as the school owns the copyright on that particular piece of code.  Hence one cannot immediately use this compiler to compile C programs without writing a C front end; anyway, an existing compiler such as MSVC or GCC would be a better choice.

The portion of the code that I wrote (everything except bitv.ml and .mli) totals roughly 3200 lines of code.  It includes two optimizations based on classical data-flow analysis, constant propagation and dead-statement elimination.  It also supports translation into and out of static single assignment form, as well as two optimizations based on SSA:  constant propagation and loop-invariant code motion.  The code for the graph-coloring register allocator is not included.  As a back-end, the compiler produces C code that can be compiled by any other C compiler.

The code is a pretty good example of how to structure medium-sized programs in functional programming languages.  I tried to adopt a pure functional style throughout most of the codebase.  Sometimes this failed:  cfg_ir.ml is unnecessarily ugly, and should have been re-written in an imperative style with mutability; also, I made the mistake of using a mutable array to describe Phi values in static single assignment whereas pure functional style would have dictated a list.  But those are my only complaints with the code; overall, I'm pretty pleased with how it all turned out.

This code is substantially more sophisticated than the compiler that I wrote to break VMProtect, so if you can read and understand this release, you should be in good shape for breaking virtualization obfuscators.


Archived Entries for RolfRolles
Subject # Views Created On
VinE's OCaml Programming Tricks: Explicit Continuation-Passing Style 8812     Friday, December 4 2009
Structure Recovery as Counter-Example Guided Abstraction Refinement 7074     Friday, October 9 2009
InfoSec Institute's RE Course 5275     Sunday, March 15 2009
Switch as Binary Search, Part 1 4976     Friday, November 28 2008
Switch as Binary Search, Part 0 4266     Friday, November 28 2008
Bagle.W IDB 3962     Wednesday, November 5 2008
Part 3: Optimizing and Compiling 9087     Wednesday, August 6 2008
Part 2: Introduction to Optimization 5889     Wednesday, August 6 2008
Part 1: Bytecode and IR 5635     Wednesday, August 6 2008
VMProtect, Part 0: Basics 9618     Wednesday, August 6 2008
Compiler 1, X86 Virtualizer 0 14570     Friday, April 4 2008
Industrial-Grade Binary-Only Profiling and Coverage 5075     Saturday, February 16 2008
Array Indexing Qurik 2376     Wednesday, February 13 2008
SpyShredder Malware Spammed on OpenRCE 3277     Tuesday, February 5 2008
Compiler Optimizations Regarding Structures 3095     Tuesday, January 22 2008
Binary Search in Large-Scale Structure Recovery 2217     Monday, January 21 2008
CommWarrior.B Thorough IDB (ARM/C++) 4012     Thursday, January 3 2008
Weird Code: CCs on the Stack 2011     Tuesday, January 1 2008
T2 2006 VM Analysis 4013     Wednesday, October 10 2007
My Training Class 2864     Monday, August 13 2007
ProcDump Thorough IDB 5479     Friday, June 15 2007
IDA's .IDS Files Part II 4735     Thursday, June 7 2007
IDA's .IDS Files Part I 5123     Thursday, June 7 2007
Ten Years of Reverse Engineering 3026     Wednesday, June 6 2007
Shellcode Analysis 6530     Tuesday, April 3 2007

There are 28,212 total registered users.


Recently Created Topics
Reverse Engineering ...
Jan/23
Career: DoD Agency I...
Jan/22
"Disappearing&q...
Jan/17
Career: Software Sec...
Jan/11
Where is the call st...
Jan/07
IDA Pro 6.1 Breakpoi...
Jan/01
How to create data s...
Dec/30
can i search all mod...
Dec/23
IDA symbol table exp...
Dec/20
An anti-attach trick
Dec/17


Recent Forum Posts
Reverse Engineering ...
NirIzr
"Disappearing&q...
NirIzr
Reverse Engineering ...
charlie
"Disappearing&q...
charlie
An anti-attach trick
Bass
An anti-attach trick
waleeda...
An anti-attach trick
Bass
An anti-attach trick
waleeda...
An anti-attach trick
Bass
Looking for value in...
NirIzr


Recent Blog Entries
Ludwig
Feb/04
chi on sale

Ludwig
Feb/04
Monster In The Vicinity Of ...

Ludwig
Feb/04
Supra footwear Online

waleedassar
Jan/31
Yet Another Anti-Debug Trick

RolfRolles
Jan/22
Finding Bugs in VMs with a ...

More ...


Recent Blog Comments
waleedassar on:
Feb/01
Yet Another Anti-Debug Trick

NirIzr on:
Jan/31
Yet Another Anti-Debug Trick

jackchen on:
Jan/10
nike mercurial vapor iii

waleedassar on:
Dec/27
A new Anti-Olly trick.

PeterFerrie on:
Dec/27
A new Anti-Olly trick.

More ...


Imagery
SoySauce Blueprint
Jun 6, 2008

[+] expand

View Gallery (11) / Submit