|
Finding Bugs in VMs with a Theorem Prover, Round 1
Here's some work I did last fall. Equivalence checking is a popular technique in the academic literature and in industrial practice, used to verify that two "things" are equivalent. Commonly, it involves translating those things into some logic that is supported by a theorem prover, and asking the decision procedure to prove whether the two things must always be equal. Equivalently, we ask whether there exists any circumstances where the two things might not be equal. If the two things must always be equal under all circumstances, then a complete decision procedure will eventually report that the formula is unsatisfiable. If they can ever deviate, then it will return a "counterexample" illustrating a scenario in which the things differ. The decision procedure may not terminate in a reasonable amount of time, or do so at all if the logic is undecidable or the decision procedure is incomplete. Equivalence checking is commonly found in the hardware world (particularly, in a field known as Electronic Design Automation, or EDA), but it has been applied to software as well, for tasks such as verification that compiler optimizations have been implemented correctly, verification of cryptographic implementations, and other forms of specification-conformance. In this blog, we'll take a look at an application of equivalence checking in deobfuscation. Equivalence checking -------------------- We begin by giving an example proving the equivalence between an x86 sequence and a machine instruction. Bit-level hackers -- those who study the works of the original MIT "hackers", whose work bequeathed the term "hacker" -- will know the following example regarding taking the population count of a 32-bit integer. The "population count" of a given integer is the number of one-bits set in the integer. This operation is known as "popcnt", and the x86 instruction set features an instruction of the same name, whose syntax is "popcnt r32, r/m32" and whose semantics dictate that the population count of the right-hand side (32- or 16-bit, depending upon encoding) be deposited into the location specified by the left-hand side. We can easily imagine implementing this functionality with a loop: A hoary MIT HAKMEM gem shows how to compute the same value via a series of arithmetic manipulations and no explicit loop. We summarize the procedure below. Consider some 8-bit integer, whose bits we shall represent symbolically with letters as in the following: Let's demonstrate how the method computes the population count of this integer. First, we separate out all of the even bits and the odd bits: Adding these two quantities together, we obtain the following result: Where ww, xx, yy, and zz are each two-bit quantities such that ww = a + b, xx = c + d, yy = e + f, and zz = g + h. By adding these masked-and-shifted quantities together, we thereby compute the sum of two adjacent bits, and store the sum in that location within the integer that corresponds to those two bits. Given that 0+0 = 0, 0+1 = 1, 1+0 = 1, and 1+1 = 10, the sum can never overflow, and is hence neatly contained in those two bits. Next, we perform the same kind of trick to compute the sum of each two-bit quantity. We use a similar shift-and-add tableau: Adding these two quantities together, we get: Where iiii and jjjj are the 4-bit results such that iiii = ww + xx and jjjj = yy + zz. Once again, this sum is safe from overflow because ww and xx are at most 2-bits, whereas iiii and jjjj are 4-bits. We pull the same trick again to sum the quantities iiii and jjjj: Adding these two quantifies together, we get kkkkkkkk, where kkkkkkkk = iiii + jjjj = ww + xx + yy + zz = a + b + c + d + e + f + g + h. This corresponds to the definition of the population count of an 8-bit integer. We can easily imagine extending the same procedure for 16-, 32-bit, 64-bit, and generally 2^n-bit quantities. Below we display a fragment of x86 assembly language that computes the population count for a 32-bit integer. Sequence (1) -- messy bit hack for eax = popcnt(ebx) This is equivalent to the x86 instruction Sequence (2) -- direct ISA computation of eax = popcnt(ebx) Except that the former example also modifies the flags in some certain way, and modifies the ebx register, whereas the popcnt instruction modifies the flags in a different way and does not modify ebx. We would like to prove the equivalence of sequences (1) and (2) with respect to the relationship of the resulting eax register with the original ebx register. We must not consider the resulting value of the ebx register or the flags, for in that case, the sequences will not be equivalent in general. The code for performing this task in Pandemic follows. The theorem prover cranks for 10 seconds and returns that the formula is unsatisfiable, in other words, that it was not able to generate a counterexample where sequence(1).eax != sequence(2).eax. Assuming the soundness of Z3 with respect to propositional logic over the bitvector and array theories of finite domain, we have just generated a mathematical proof that the sequences have the same effect on the eax register. Z3 can be configured to output the proof explicitly so that it may be checked by other tools. Equivalence checking for verification of deobfuscation results -------------------------------------------------------------- I had written a deobfuscation procedure (e.g., a program of the type described here) for a commercial VM ("VM" meaning virtualization obfuscator in this context), and I wanted to know whether my deobfuscation was correct, since I felt like I might've taken some shortcuts along the way. Given access to some obfuscating transformation O, and access to a deobfuscating transformation D, we want to know if D(O(p)) is a semantics-preserving translation for all programs p. We'll settle for random testing: generating a "random" executable according to some guidelines, obfuscating it, deobfuscating it, and comparing (by equivalence checking) the deobfuscated program against the original, pre-obfuscated version. As it turns out, the translation D(O(p)) is not semantics-preserving because VM itself has many bugs that could affect the translation of the original x86 into the VM bytecode. The procedure in the last paragraph can provide us with straight-line sequences of code (or control flow graphs) such that one is the original, and the other is the obfuscated version with deobfuscation applied. We can then apply equivalence checking to determine whether these sequences are semantically equivalent. It turns out that they often are not. For example, consider this obfuscated sequence (which has been mostly deobfuscated for the sake of easy illustration): And the corresponding deobfuscated sequence: The former example has a similar effect as the latter one, but it modifies the flags before the rol instruction executes. The rol instruction does not change the flags if cl is zero, therefore, the two sequences are not semantically equivalent if cl = 0. The theorem prover finds this bug, the same bug for ror word ptr and ror byte ptr, and the equivalent bugs for rcr, rcl, rol, shl, shr, and sar. This is a total of 21 buggy VM handlers. Similarly, obfuscation can also affect the value of the flags register prior to the execution of the inc and dec instructions. Consider this deobfuscated sequence: inc and dec do not change the value of the carry flag, but add and sub do. Therefore, obfuscation applied upon the vicinity of the pop eax instruction can destroy the value of the carry flag in these circumstances. As there are three sizes that these instructions might be applied to, there are 6 bugs of this variety. The theorem prover also found more subtle bugs. Consider the following deobfuscated instruction sequence, where esi points to the VM bytecode. The program's obfuscator would alter the behavior of this snippet by introducing writes to the stack that were not present in the original version, therefore the state of the stack after execution of the obfuscated and non-obfuscated sequences would differ. The memory read on the second-to-last line might therefore target the area below the stack pointer, which would cause it to read a different value in the obfuscated and deobfuscated world. This bug is unlikely to manifest itself in the obfuscation of a real-world program, although it demonstrates the ruthless efficacy of theorem proving towards this problem: the ability to discover tricky situations like this one by considering all possible concrete behaviors of the individual snippets, and generate countermodels for inequal scenarios, no matter how far-fetched. When taking these factors into consideration, I am able to say that my deobfuscator is correct modulo the bugs in the original obfuscator, which demonstrably number at least 33 (or about 20% of the total handlers). Comments
| ||||||