Flag: Tornado! Hurricane!

 Forums >>  Brainstorms - General  >>  My Deobfuscator

Topic created on: December 26, 2007 08:50 CST by bodzcount .

Hi, i am back :)

I have worked a little bit on my ollydbg-tracelog based deobfuscator programm.

A few words how the programm is designed:
- The base structure is a ControlFlowGraph (all template based modular design) where the parser puts in the code that it extracts from the tracelog
- The parser is built using boost::spirit
- The CFG uses a Node and an Edge template class
- The node uses an Instruction template, so that each node stores its own instructions
- The instruction class is processor-specific but has also an generic interface (that can be used for light deobfuscation)
- The instruction class has functions like calculateKnownRegisters and calculateKnownFlags which do the stuff that the name suggests. This information can then late be used by deobfuscation modules

I am thinking about publishing this programm, or make it open source. Is anybody interested to participate?

More details soon.

regards,
Bodz

  bodzcount     December 26, 2007 08:56.32 CST
Some information on the deobfuscation routines i have already implemented.

example: removeFakeJumps
what it does:
- check if last instruction of node is unconditional jump
- check if current node has only ONE outgoing edge
- check if following node (in CFG) has onle ONE incoming edge
- if all applies -> remove jmp instruction and merge nodes

I have about 7 routines implemted so far. With that i can reduce the complexity of an obfuscated function from a popular obfuscator like this:
obfuscated:
CFG has
  1614 nodes
  1724 edges
   5093 instructions
after algorithms:
CFG has
  253 nodes
  363 edges
   3210 instructions

I could have reduced the instructions even more easily, but havent focused on that one, because the number of instructions per node is already quite low and i think CFG simplification is more important now.

I have also implemted a ManagerClass for wingraphviz, so that i can output the cfg as SVG graph file.

regards,
Bodz

  bodzcount     December 26, 2007 09:00.01 CST
You can view the CFG here:
http://www.speedshare.org/2FUPiowFBE

it is svg format, that most new browsers can display. However the problem with the browsers is, that the can not zoom out. A better solution is a vector programm like corel draw.

Edit: I have also a jpg version of the CFG
http://www.speedshare.org/eMJqkOSwGh

regards,
Bodz

  bodzcount     December 26, 2007 09:07.38 CST
I would like to talk now especially about techniques for simplifying CFGs. Does anybody have ideas or knows ressources?

regards,
Bodz

  dELTA     December 26, 2007 12:30.10 CST
Sorry, don't know that, but I'm indeed looking forward to seeing this tool released!

  sovietskicpu     December 26, 2007 21:34.07 CST
"example: removeFakeJumps
what it does:
- check if last instruction of node is unconditional jump
- check if current node has only ONE outgoing edge
- check if following node (in CFG) has onle ONE incoming edge
- if all applies -> remove jmp instruction and merge nodes"

hi bodzcount, do you mean that you are implementing a instructions flow sanitizer ? is your tool able to deal with such conditions of Garbage Emulated JMP (EIP + 1) :

Garbage CALL :

CALL (EIP + SizeOf(CALL_INSTRUCTION) => Call next instruction

or

PUSH OFFSET NEXT
MOV EAX,(EIP + SizeOf(CALL_INSTRUCTION)
CALL EAX => will execute 2 times
RET // 2nd time will do a JMP EIP + 1
@OFFSET NEXT:
....

or

PUSH (EIP + SizeOf(PUSH_INSTRUCTION) + SizeOf(RET_INSTRUCTION))
RET => will execute as JMP EIP + 1

or

MOV EAX,OFFSET NEXT
JMP EAX => will execute JMP EIP + 1
@OFFSET_NEXT:
etc..

is this what your tool is supposed to do ? else to handle ?

  bodzcount     December 27, 2007 02:34.35 CST
Yes, thats what its supposed to do and can do.

Example:
PUSH OFFSET NEXT
MOV EAX,(EIP + SizeOf(CALL_INSTRUCTION)
CALL EAX => will execute 2 times
RET // 2nd time will do a JMP EIP + 1
@OFFSET NEXT:

how does this look in a parsed tracelog (CFG before algorithms except parsing algorithm applied)?

RVA1 PUSH OFFSET NEXT
RVA2 MOV EAX,(EIP + SizeOf(CALL_INSTRUCTION)
RVA3 CALL EAX => (i guess only the ret will be executed twice)
-> new node
RVA4 RET // 2nd time will do a JMP EIP + 1
-> new node
RVA4 RET // 2nd time will do a JMP EIP + 1
-> new node
@OFFSET NEXT: ...

why my program does is the following:
it calculates if a conditional jump (variable target is here a conditional jump too) if determined. if yes, it will be replaced with an unconditional jmp.

how this is done technically?
the processor-specific module has routines calculateRegisters and calculateFlags as described earlier. I didnt mention that is has also a function calculateJumpCondition which takes the Flag and Register information and calculates if this specific jump is determined. I havent yet implemented it for the ret instruction, however this would be possible by introducing the first-stack-entry as another "Register" that will be calculated by calculateRegisters.

Stuff it can handle else?
-remove useless unconditional jumps and merge nodes
-convert fake conditional to unconditional jumps

other stuff are specific patterns like:
push val
pop eax

or

push fake_val
mov eax, val
xchg [esp], val


regards,
Bodz

  bodzcount     December 27, 2007 03:34.45 CST
I have a new idea: "Information Based Deobfuscation"

Information is: Register contents, Flags, Memory
Information scope: current Node ?

some instructions add new information but some instructions also "destroy information"

for example add eax,edx destroys previous flag information S,Z,O,... and register edx but adds information eax

Conclusion:
???

Exercise: How does the algorithm have to look like? ^^

my ideas:
- delete instructions that dont add new information
- delete instructions that add information which is not used later

  VLaaD     December 27, 2007 13:16.23 CST
Perhaps you won't like it, but web technologies are currently using very portable and easy to serialize/deserialize config files; for example - one approach would be using JSON (http://www.json.org/), however, the best solution IMHO is to use plain simple XML to store all the meta data / that could prove multibeneficial (easier search, sorting, SQL server insertion, it's hierarhical, so you can deserialize OOP's objects... etc.)

  bodzcount     December 27, 2007 16:56.21 CST
I like everything that does the job :)

  bodzcount     December 28, 2007 19:44.36 CST
for example this code:

0258ed17  PUSH  ECX
0258ed1d  MOV  <known>ECX, 591A4343
0258ed1e  XOR  <known>ECX, 18149A15
0255d07e  OR  <known>ECX, 54AD660
0255d084  CMP  <known>ECX, 8C96B711
02589d2b  AND  <known>ECX, 32D25C32
02589d31  SUB  <known>ECX, B6E30965
02589d37  TEST  <known>ECX, 40000000
0259bdf8  XOR  <known>ECX, E0B4F70B
0259bdfe  ADD  EDX, ECX
0259be00  POP  ECX

the algorithm could be:
- check if known information can be injected in instruction
(here: inject is only possible in "add edx,ecx")
- check if generated information is not used
(here a lot of instructions just generate ecx and flag states as information, however ecx information is destroyed with "pop ecx" so only flag information is left)
- if flag information is not needed later too, for example popfd could destroy flag information then the following instructions could be removed:
0258ed1d  MOV  <known>ECX, 591A4343
0258ed1e  XOR  <known>ECX, 18149A15
0255d07e  OR  <known>ECX, 54AD660
0255d084  CMP  <known>ECX, 8C96B711
02589d2b  AND  <known>ECX, 32D25C32
02589d31  SUB  <known>ECX, B6E30965
02589d37  TEST  <known>ECX, 40000000
0259bdf8  XOR  <known>ECX, E0B4F70B

so far my thoughts :)

regards,
bodz

  bodzcount     December 31, 2007 05:44.13 CST
I am adding differential deobfuscation functionality right now. Does anybody have experience with this?

PS: since all my stuff is tracelog based, it would be nice to have a very fast generic tracer, something like the nonintrusive debugging stuff. it wouldn't be to much work probably..
Ollydbg does about 300 ins/s on my machine, i think this could be faster...

Guten Rutsch and regards,
bodz

  kioresk     January 13, 2008 10:41.57 CST
Trace log of virtualized code is usefull to understand how code virtualization works and to collect obfuscating patterns.

If you want to devirtualize code, first you need to deobfuscate code in program - not in trace log. ;-)
So a standalone tool that staticly deobfuscate code is a better solution in this case.

Currently, i'm working on it, now it supports only few patterns, i'm planning to finish first fully working version by the end of this month.

Regarding generic code anylizer, i thought about it too:
- convert commands into upperlevel ones
- convert used registers/values from memory to variables and store them in a table
- analyze short parts of code and remove useless instructions and concatenate similiar operations into one

For example, let's look on provided pattern:

PUSH ECX           = save var_1
MOV  ECX, 591A4343 = initialize var_2
XOR  ECX, 18149A15 = change var_2
OR   ECX, 54AD660  = modify var_2
CMP  ECX, 8C96B711 = compare var_2
AND  ECX, 32D25C32 = modify var_2
SUB  ECX, B6E30965 = modify var_2
TEST ECX, 40000000 = compare var_2
XOR  ECX, E0B4F70B = modify var_2
ADD  EDX, ECX      = add var_2 to var_3
POP  ECX           = destroy var_2, restore var_1

But then i decide to use easier way - scan and convert predefined patters. It still contains some kind of analyzer, but it's more "hardcoded" and ofcource easier to code.

  c1de0x   January 15, 2008 00:48.04 CST
bodzcount, kioresk:

this is really similar to what pynary does (and will do more of in the future), except that pynary is purely static at the moment. I.e. it parses flat executables on disk and generates the CFG.

Maybe we should all consider merging our efforts? It'd be great to focus all the effort in one place in order to avoid repetition of labour and push the tool further.

Ping me via pm, email or IRC.

c1de0x

  bodzcount     January 16, 2008 08:07.06 CST
Hi everybody :)
I am on holiday right now in Bukarest... It would be good to bundle the deobfuscation effords somehow. After I come home, i guess in 1 or 2 weeks, i will most likely release my deobfuscation tool (called LNP -last nerd project-). My motivation for this tool was to deobfuscate execryptor, but i think the programm is very modular so it could be used for other stuff as well.
After we have looked into all that available stuff from us, we can maybe decide to put it together somehow... maybe sourceforge...

greetings,
bodz

  c1de0x   January 22, 2008 12:05.02 CST
bodz:

Give me a ping when you return (pm, IRC, googletalk)... I'd love to do some collaboration.

If you're looking for somewhere to host your code, consider http://openrce-snippets.googlecode.com ;)

  bodzcount     January 26, 2008 08:25.08 CST
Hi,
i am back from holidays :)

A little preview can be seen here:

http://bodzcount.110mb.com/

I still don't know under which license I should publish the sources. Any idea what would be the best for it?

regards,
Bodz

  nico     February 9, 2008 08:09.16 CST
depends of what you want people to do or not do with your stuff i guess.. but a lot of people don't care much about licenses too

  bodzcount     February 10, 2008 08:15.30 CST
I guess the latest GPL would be ok. I dont want that anybody just takes my code and makes money out of it :)

Note: Registration is required to post to the forums.

There are 31,313 total registered users.


Recently Created Topics
[help] Unpacking VMP...
Mar/12
Reverse Engineering ...
Jul/06
hi!
Jul/01
let 'IDAPython' impo...
Sep/24
set 'IDAPython' as t...
Sep/24
GuessType return une...
Sep/20
About retrieving the...
Sep/07
How to find specific...
Aug/15
How to get data depe...
Jul/07
Identify RVA data in...
May/06


Recent Forum Posts
Finding the procedur...
rolEYder
Question about debbu...
rolEYder
Identify RVA data in...
sohlow
let 'IDAPython' impo...
sohlow
How to find specific...
hackgreti
Problem with ollydbg
sh3dow
How can I write olly...
sh3dow
New LoadMAP plugin v...
mefisto...
Intel pin in loaded ...
djnemo
OOP_RE tool available?
Bl4ckm4n


Recent Blog Entries
halsten
Mar/14
Breaking IonCUBE VM

oleavr
Oct/24
Anatomy of a code tracer

hasherezade
Sep/24
IAT Patcher - new tool for ...

oleavr
Aug/27
CryptoShark: code tracer ba...

oleavr
Jun/25
Build a debugger in 5 minutes

More ...


Recent Blog Comments
nieo on:
Mar/22
IAT Patcher - new tool for ...

djnemo on:
Nov/17
Kernel debugger vs user mod...

acel on:
Nov/14
Kernel debugger vs user mod...

pedram on:
Dec/21
frida.github.io: scriptable...

capadleman on:
Jun/19
Using NtCreateThreadEx for ...

More ...


Imagery
SoySauce Blueprint
Jun 6, 2008

[+] expand

View Gallery (11) / Submit