So I was contemplating the task of deobfuscation of code and realized, as many of you probably do, that it's a function of optimization.
If one minimizes the code path, the algorithm becomes clearer, faster and smaller. This is still rather vague to me, so please excuse my verbosity.
While lots of compilers try to optimize, the real solution for all intents and purposes really is to optimize binary forms.
If one were to trace through the graph as the program is running and generate an output instruction for each new one run (regardless of where it came from in memory); and then minimize it's nodes by appending nodes where they're only entered from one point that would be a helpful algorithm.
The next step would be to remove unnecessary instructions. Trace the graph backwards. For any change in memory or code flow, accept each instruction that eventually affects that change in memory or code flow. You do this backwards so that:
mov al, 3
and cl, al
jmp $+cl
will accept the "mov al, 3" because it affects cl which affects the jump. To do this, you can store a bit for each register and another for each flag to mark it affecting.
This would optimize out purely unnecessary instructions, without considering possible alternatives. It would still be major breakthrough, as you could take ANY program and generate the simplified code flow without junk instructions.
Thoughts?






