Flag: Tornado!
Hurricane!
|
|
Topic created on: December 26, 2007 08:50 CST by bodzcount .
Hi, i am back :)
I have worked a little bit on my ollydbg-tracelog based deobfuscator programm.
A few words how the programm is designed:
- The base structure is a ControlFlowGraph (all template based modular design) where the parser puts in the code that it extracts from the tracelog
- The parser is built using boost::spirit
- The CFG uses a Node and an Edge template class
- The node uses an Instruction template, so that each node stores its own instructions
- The instruction class is processor-specific but has also an generic interface (that can be used for light deobfuscation)
- The instruction class has functions like calculateKnownRegisters and calculateKnownFlags which do the stuff that the name suggests. This information can then late be used by deobfuscation modules
I am thinking about publishing this programm, or make it open source. Is anybody interested to participate?
More details soon.
regards,
Bodz
Some information on the deobfuscation routines i have already implemented.
example: removeFakeJumps
what it does:
- check if last instruction of node is unconditional jump
- check if current node has only ONE outgoing edge
- check if following node (in CFG) has onle ONE incoming edge
- if all applies -> remove jmp instruction and merge nodes
I have about 7 routines implemted so far. With that i can reduce the complexity of an obfuscated function from a popular obfuscator like this:
obfuscated:
CFG has
1614 nodes
1724 edges
5093 instructions
after algorithms:
CFG has
253 nodes
363 edges
3210 instructions
I could have reduced the instructions even more easily, but havent focused on that one, because the number of instructions per node is already quite low and i think CFG simplification is more important now.
I have also implemted a ManagerClass for wingraphviz, so that i can output the cfg as SVG graph file.
regards,
Bodz
|
I would like to talk now especially about techniques for simplifying CFGs. Does anybody have ideas or knows ressources?
regards,
Bodz
|
Sorry, don't know that, but I'm indeed looking forward to seeing this tool released!
|
"example: removeFakeJumps
what it does:
- check if last instruction of node is unconditional jump
- check if current node has only ONE outgoing edge
- check if following node (in CFG) has onle ONE incoming edge
- if all applies -> remove jmp instruction and merge nodes"
hi bodzcount, do you mean that you are implementing a instructions flow sanitizer ? is your tool able to deal with such conditions of Garbage Emulated JMP (EIP + 1) :
Garbage CALL :
CALL (EIP + SizeOf(CALL_INSTRUCTION) => Call next instruction
or
PUSH OFFSET NEXT
MOV EAX,(EIP + SizeOf(CALL_INSTRUCTION)
CALL EAX => will execute 2 times
RET // 2nd time will do a JMP EIP + 1
@OFFSET NEXT:
....
or
PUSH (EIP + SizeOf(PUSH_INSTRUCTION) + SizeOf(RET_INSTRUCTION))
RET => will execute as JMP EIP + 1
or
MOV EAX,OFFSET NEXT
JMP EAX => will execute JMP EIP + 1
@OFFSET_NEXT:
etc..
is this what your tool is supposed to do ? else to handle ?
|
Yes, thats what its supposed to do and can do.
Example:
PUSH OFFSET NEXT
MOV EAX,(EIP + SizeOf(CALL_INSTRUCTION)
CALL EAX => will execute 2 times
RET // 2nd time will do a JMP EIP + 1
@OFFSET NEXT:
how does this look in a parsed tracelog (CFG before algorithms except parsing algorithm applied)?
RVA1 PUSH OFFSET NEXT
RVA2 MOV EAX,(EIP + SizeOf(CALL_INSTRUCTION)
RVA3 CALL EAX => (i guess only the ret will be executed twice)
-> new node
RVA4 RET // 2nd time will do a JMP EIP + 1
-> new node
RVA4 RET // 2nd time will do a JMP EIP + 1
-> new node
@OFFSET NEXT: ...
why my program does is the following:
it calculates if a conditional jump (variable target is here a conditional jump too) if determined. if yes, it will be replaced with an unconditional jmp.
how this is done technically?
the processor-specific module has routines calculateRegisters and calculateFlags as described earlier. I didnt mention that is has also a function calculateJumpCondition which takes the Flag and Register information and calculates if this specific jump is determined. I havent yet implemented it for the ret instruction, however this would be possible by introducing the first-stack-entry as another "Register" that will be calculated by calculateRegisters.
Stuff it can handle else?
-remove useless unconditional jumps and merge nodes
-convert fake conditional to unconditional jumps
other stuff are specific patterns like:
push val
pop eax
or
push fake_val
mov eax, val
xchg [esp], val
regards,
Bodz
|
I have a new idea: "Information Based Deobfuscation"
Information is: Register contents, Flags, Memory
Information scope: current Node ?
some instructions add new information but some instructions also "destroy information"
for example add eax,edx destroys previous flag information S,Z,O,... and register edx but adds information eax
Conclusion:
???
Exercise: How does the algorithm have to look like? ^^
my ideas:
- delete instructions that dont add new information
- delete instructions that add information which is not used later
|
Perhaps you won't like it, but web technologies are currently using very portable and easy to serialize/deserialize config files; for example - one approach would be using JSON (http://www.json.org/), however, the best solution IMHO is to use plain simple XML to store all the meta data / that could prove multibeneficial (easier search, sorting, SQL server insertion, it's hierarhical, so you can deserialize OOP's objects... etc.)
|
I like everything that does the job :)
|
for example this code:
0258ed17 PUSH ECX
0258ed1d MOV <known>ECX, 591A4343
0258ed1e XOR <known>ECX, 18149A15
0255d07e OR <known>ECX, 54AD660
0255d084 CMP <known>ECX, 8C96B711
02589d2b AND <known>ECX, 32D25C32
02589d31 SUB <known>ECX, B6E30965
02589d37 TEST <known>ECX, 40000000
0259bdf8 XOR <known>ECX, E0B4F70B
0259bdfe ADD EDX, ECX
0259be00 POP ECX
the algorithm could be:
- check if known information can be injected in instruction
(here: inject is only possible in "add edx,ecx")
- check if generated information is not used
(here a lot of instructions just generate ecx and flag states as information, however ecx information is destroyed with "pop ecx" so only flag information is left)
- if flag information is not needed later too, for example popfd could destroy flag information then the following instructions could be removed:
0258ed1d MOV <known>ECX, 591A4343
0258ed1e XOR <known>ECX, 18149A15
0255d07e OR <known>ECX, 54AD660
0255d084 CMP <known>ECX, 8C96B711
02589d2b AND <known>ECX, 32D25C32
02589d31 SUB <known>ECX, B6E30965
02589d37 TEST <known>ECX, 40000000
0259bdf8 XOR <known>ECX, E0B4F70B
so far my thoughts :)
regards,
bodz
|
I am adding differential deobfuscation functionality right now. Does anybody have experience with this?
PS: since all my stuff is tracelog based, it would be nice to have a very fast generic tracer, something like the nonintrusive debugging stuff. it wouldn't be to much work probably..
Ollydbg does about 300 ins/s on my machine, i think this could be faster...
Guten Rutsch and regards,
bodz
|
Trace log of virtualized code is usefull to understand how code virtualization works and to collect obfuscating patterns.
If you want to devirtualize code, first you need to deobfuscate code in program - not in trace log. ;-)
So a standalone tool that staticly deobfuscate code is a better solution in this case.
Currently, i'm working on it, now it supports only few patterns, i'm planning to finish first fully working version by the end of this month.
Regarding generic code anylizer, i thought about it too:
- convert commands into upperlevel ones
- convert used registers/values from memory to variables and store them in a table
- analyze short parts of code and remove useless instructions and concatenate similiar operations into one
For example, let's look on provided pattern:
PUSH ECX = save var_1
MOV ECX, 591A4343 = initialize var_2
XOR ECX, 18149A15 = change var_2
OR ECX, 54AD660 = modify var_2
CMP ECX, 8C96B711 = compare var_2
AND ECX, 32D25C32 = modify var_2
SUB ECX, B6E30965 = modify var_2
TEST ECX, 40000000 = compare var_2
XOR ECX, E0B4F70B = modify var_2
ADD EDX, ECX = add var_2 to var_3
POP ECX = destroy var_2, restore var_1
But then i decide to use easier way - scan and convert predefined patters. It still contains some kind of analyzer, but it's more "hardcoded" and ofcource easier to code.
|
bodzcount, kioresk:
this is really similar to what pynary does (and will do more of in the future), except that pynary is purely static at the moment. I.e. it parses flat executables on disk and generates the CFG.
Maybe we should all consider merging our efforts? It'd be great to focus all the effort in one place in order to avoid repetition of labour and push the tool further.
Ping me via pm, email or IRC.
c1de0x
|
Hi everybody :)
I am on holiday right now in Bukarest... It would be good to bundle the deobfuscation effords somehow. After I come home, i guess in 1 or 2 weeks, i will most likely release my deobfuscation tool (called LNP -last nerd project-). My motivation for this tool was to deobfuscate execryptor, but i think the programm is very modular so it could be used for other stuff as well.
After we have looked into all that available stuff from us, we can maybe decide to put it together somehow... maybe sourceforge...
greetings,
bodz
|
bodz:
Give me a ping when you return (pm, IRC, googletalk)... I'd love to do some collaboration.
If you're looking for somewhere to host your code, consider http://openrce-snippets.googlecode.com ;)
|
Hi,
i am back from holidays :)
A little preview can be seen here:
http://bodzcount.110mb.com/
I still don't know under which license I should publish the sources. Any idea what would be the best for it?
regards,
Bodz
|
depends of what you want people to do or not do with your stuff i guess.. but a lot of people don't care much about licenses too
|
I guess the latest GPL would be ok. I dont want that anybody just takes my code and makes money out of it :)
|
Note: Registration is required to post to the forums.
|
|
|
There are 31,313 total registered users.
|
|