OpenRCE

📚 OpenRCE is preserved as a read-only archive. Launched at RECon Montreal in 2005. Registration and posting are disabled.

About Articles Book Store Distributed RCE Downloads Event Calendar Forums Live Discussion Reference Library RSS Feeds Search Users What's New

Customize Theme

Flag: Tornado! Hurricane!

Created: Wednesday, August 6 2008 18:08.20 CDT

Printer Friendly ...

Part 3: Optimizing and Compiling

Author: RolfRolles

# Views: 18990

The reader should inspect this unoptimized IR listing before continuing. In an attempt to keep this entry from becoming unnecessarily long, the example snippets will be small, but for completeness a more thorough running example is linked throughout the text.

We begin by removing the stack machine features of the IR. Since VMProtect operates on disassembled x86 code, and x86 itself is not a stack machine, this aspect of the protection is unnatural and easily removed. Here is a 15-line fragment of VMProtect IR.

push Dword(-88)

push esp

push Dword(4)

pop t3

pop t4

t5 = t3 + t4

push t5

push flags t5

pop DWORD Scratch:[Dword(52)]

pop t6

pop t7

t8 = t6 + t7

push t8

push flags t8

pop DWORD Scratch:[Dword(12)]

pop esp

All but two instructions are pushes or pops, and the pushes can be easily matched up with the pops. Tracking the stack pointer, we see that, for example, t3 = Dword(4). A simple analysis allows us to "optimize away" the push/pop pairs into assignment statements. Simply iterate through each instruction in a basic block and keep a stack describing the source of each push. For every pop, ensure that the sizes match and record the location of the corresponding push. We wish to replace the pop with an assignment to the popped expression from the pushed expression, as in

t3 = Dword(4)

t4 = esp 

t7 = Dword(-88)

With the stack aspects removed, we are left with a more conventional listing containing many assignment statements. This optimization substantially reduces the number of instructions in a given basic block (~40% for the linked example) and opens the door for other optimizations. The newly optimized code is eight lines, roughly half of the original:

t3 = Dword(4)

t4 = esp

t5 = t3 + t4

DWORD Scratch:[Dword(52)] = flags t5

t6 = t5

t7 = Dword(-88)

t8 = t6 + t7

DWORD Scratch:[Dword(12)] = flags t8

esp = t8

A complete listing of the unoptimized IR versus the one with the stack machine features removed is here, which should be perused before proceeding.

Now we turn our attention to the temporary variables and the scratch area. Recall that the former were not part of the pre-protected x86 code, nor the VMProtect bytecode -- they were introduced in order to ease the IR translation. The latter is part of the VMProtect bytecode, but was not part of the original pre-protected x86 code. Since these are not part of the languages we are modelling, we shall eliminate them wholesale. On a high level, we treat each temporary variable, each byte of the scratch space, and each register as being a variable defined within a basic block, and then eliminate the former two via the compiler optimizations previously discussed.

Looking again at the last snippet of IR, we can see several areas for improvement. First, consider the variable t6. It is clearly just a copy of t5, neither of which are redefined before the next use in the assignment to t8. Copy propagation will replace variable t6 with t5 and eliminate the former. More generally, t3, t4, and t7 contain either constants or values that are not modified between their uses. Constant and copy propagation will substitute the assignments to these variables in for their uses and eliminate them.

The newly optimized code is a slender three lines compared to the original 15; we have removed 80% of the IR for the running example.

DWORD Scratch:[Dword(52)] = flags Dword(4) + esp

esp = Dword(4) + esp + Dword(-88)

DWORD Scratch:[Dword(12)] = flags Dword(4) + esp + Dword(-88)

The side-by-side comparison can be found here.

The IR now looks closer to x86, with the exception that the results of computations are being stored in the scratch area, not into registers. As before, we apply dead-store elimination, copy and constant propagation to the scratch area, removing dependence upon it entirely in the process. See here for a comparison with the last phase.

Here is a comparison of the final, optimized code against the original x86:

push ebp                               push    ebp                    

ebp = esp                              mov     ebp, esp               

push Dword(-1)                         push    0FFFFFFFFh             

push Dword(4525664)                    push    450E60h                

push Dword(4362952)                    push    offset sub_4292C8      

eax = DWORD FS:[Dword(0)]              mov     eax, large fs:0

push eax                               push    eax                    

DWORD FS:[Dword(0)] = esp              mov     large fs:0, esp        

eflags = flags esp + Dword(-88)                            

esp = esp + Dword(-88)                 add     esp, 0FFFFFFA8h        

push ebx                               push    ebx                    

push esi                               push    esi                    

push edi                               push    edi                    

DWORD SS:[Dword(-24) + ebp] = esp      mov     [ebp-18h], esp         

call DWORD [Dword(4590300)]            call    dword ptr ds:unk_460ADC

vmreturn Dword(0) + Dword(4638392)

Code generation is an afterthought.

Blog Comments

Orr	Posted: Sunday, August 10 2008 06:43.41 CDT
Just got the time to read the entire thing. Amazing ;)

RolfRolles

Posted: Wednesday, January 28 2009 20:42.33 CST

I submitted a paper to the 2009 IEEE Symposium on Security and Privacy on this topic. I didn't have time to finish it, so I submitted an incomplete draft and wasn't surprised when it was rejected. However, I liked this bit of feedback from the anonymous reviewers:

"Looking around the Internet, the paper's main contribution is mostly similar to blog post of Rolf Rolles' blog that talk about manually reverse engineering VMProtect. The authors should include novel work [...]"

:-(

sp	Posted: Thursday, January 29 2009 10:27.01 CST
That's funny, mate. At least for the uninvolved reader. :)

506398911qqcom	Posted: Wednesday, May 18 2011 20:40.35 CDT
Good Job! I love your work for vmp

There are 31,328 total registered users.

Recently Created Topics
[help] Unpacking VMP...	Mar/12
Reverse Engineering ...	Jul/06
let 'IDAPython' impo...	Sep/24
set 'IDAPython' as t...	Sep/24
GuessType return une...	Sep/20
About retrieving the...	Sep/07
How to find specific...	Aug/15
How to get data depe...	Jul/07
Identify RVA data in...	May/06
Question about memor...	Dec/12

Recent Forum Posts
Finding the procedur...	rolEYder
Question about debbu...	rolEYder
Identify RVA data in...	sohlow
let 'IDAPython' impo...	sohlow
How to find specific...	hackgreti
Problem with ollydbg	sh3dow
How can I write olly...	sh3dow
New LoadMAP plugin v...	mefisto...
Intel pin in loaded ...	djnemo
OOP_RE tool available?	Bl4ckm4n

Recent Blog Entries
	halsten	Mar/14
Breaking IonCUBE VM
	oleavr	Oct/24
Anatomy of a code tracer
	hasherezade	Sep/24
IAT Patcher - new tool for ...
	oleavr	Aug/27
CryptoShark: code tracer ba...
	oleavr	Jun/25
Build a debugger in 5 minutes
More ...

Recent Blog Comments
	nieo on:	Mar/22
IAT Patcher - new tool for ...
	djnemo on:	Nov/17
Kernel debugger vs user mod...
	acel on:	Nov/14
Kernel debugger vs user mod...
	pedram on:	Dec/21
frida.github.io: scriptable...
	capadleman on:	Jun/19
Using NtCreateThreadEx for ...
More ...

Imagery

SoySauce Blueprint
Jun 6, 2008

[+] expand

View Gallery (11) / Submit