This might be of some interest to someone who wants to make the reverse
engineering of his software much harder(like the authors of shareware
software). The technique I've used was to scramble the code at compile
time. This is more powerful than generic scramblers since the code stays
scrambled in memory while executing. It is almost impossible to follow
the program algorithm using a disassembler tool like IDA. All
cross-references in IDA are gone and automatic code disassembly is
constantly tricked by the random opcodes insertion.
I have added scrambling for both generated code and data for the Tiny C
Compiler. I've targeted this compiler because it has a very small footprint
(~150K) and is capable of producing both PE and ELF executables without the
help of any linker and it is fully ANSI compliant.
You can obfuscate all calls and long jumps, the parameters passed to
external functions, parameters passed to local functions, the stack, the
functions prolog and epilog. You can also encrypt your code data section
by XORing it with a LFSR.
http://uglyduck.ath.cx/ep/archive/2010/04/Tiny_C_Scrambling_Compiler.html
With the 'x' switch the compiler heavily pollutes the generated code
making it larger and slower. The purpose of this operation is to
obfuscate the generated code and make reverse engineering harder.
The 'x' switch by itself will enable all the scramble options. You may
select individual scrambling features by listing them after the '-x':
- 'c' obfuscate all calls
- 'j' obfuscate all long jumps
- 'f' obfuscate parameters passed to external (library) functions
- 'p' obfuscate parameters passed to local functions
- 's' obfuscate the stack (size of local variables and their references)
- 'b' obfuscate functions prolog
- 'e' obfuscate functions epilog (returns are replaced with jumps)
- 'd' encrypt data segment with a LFSR
The LFSR initial value as well as the unscrambling code is different
with every compile. TCC generates read only objects in the data section
(rather than rodata) so all your strings will be encrypted.
Code obfuscation is mainly achieved by inserting random data between
genuine operations. This tricks disassemblers because they will try to
disassemble the random data. They will miss real opcodes due to variable
size garbage instructions engulfing the former ones. All addressing is
changed to offset addressing using a variable base (usually in ebx).
This prevents disassemblers to generate any cross-references for both
functions and data.
The scrambling functionality is a patch against a stripped down version
of tcc 0.9.25 which handles exclusively only i386 code. Both the Linux
version and the cross-compiled version which generates Windows code work.
The current release passes tcctest with all scrambling switches enabled.
If you want to compile this compiler for a Windows platform you will
probably fail (mainly because I'm using /dev/urandom). I have no plans
to make it work for Windows because I'm not interested in that platform.
You can generate Windows code from Linux with the cross-compiler.
Generating static executables was broken on my Debian system (with stock
tcc 0.9.25) so I've patched this version to use dietlibc. This has the
advantage of making small executables which run on any kernel version
(the bloated libc checks kernel versions and refuses to run even if you
don't need any of the 2.6 functionality). The -run switch (used for C
scripting) now creates (in memory) static versions of your C. This is
faster and the program occupies less space. You you don't want statically
linked scripts you'll have to use -rdynamic with the -run switch.







