📚 OpenRCE is preserved as a read-only archive. Launched at RECon Montreal in 2005. Registration and posting are disabled.








Flag: Tornado! Hurricane!

 Forums >>  IDA Pro  >>  Automatically Creating ALIGN Blocks

Topic created on: July 17, 2006 03:45 CDT by bushing .

I've found that a lot of windows binaries I disassemble contain big series of 0x90909090 between functions, and sometimes 0xCCCCCCCC.  (I know that 0x90 is NOOP -- why CC?)

It would be desirable to automatically convert these blocks to  ALIGN directives, and to recognize that a new object (function or data) usually immediately follows.

In data sections, I also see big blocks of zeroes that could be made into ALIGN blocks.  For many of these situations, IDA is smart enough to "do the right thing" when I go to the start of the "align block" and hit L.  This gets tiresome when analyzing a big binary.  Is there a good way to automate this?

Here's the best I've come up with, and it doesn't always work.  Please feel free to use it, and please give me suggestions for improvement.  Thanks!
Ben

from idaapi import *
from idc import *

seg=getseg(get_screen_ea())
ea=seg.startEA
while ea<seg.endEA:
        ea=find_unknown(ea,1)
        if(Byte(ea)==0x90 or Byte(ea)==0xCC):
                start_addr=ea
                start_byte=Byte(ea)
                end_addr=start_addr+1
                while (end_addr < seg.endEA) and (Byte(end_addr)==start_byte):
                        end_addr=end_addr+1

                print "Found a range of %x at %x, len = %x\n" % (start_byte,start_addr,end_addr-start_addr)
                doAlign(start_addr,end_addr-start_addr,0)
                ea=end_addr
        if isUnknown(getFlags(ea)):
                auto_make_code(ea)


This has a few flaws:
* Doesn't handle blocks of 0 inside data segments
* Only finds align blocks inside of unknown parts of code -- if IDA has gone ahead and coverted my function into an array of dwords, all I can do is run another script to undefine all the data arrays in the program.

There has to be an easier way to do this, right?
Ben

  igorsk     July 17, 2006 04:19.49 CDT
0xCC = int 3 (breakpoint)

  ryanlrussell     July 17, 2006 20:02.33 CDT
CC is int 3, aka a soft breakpoint.  You usually see these in debug builds.

The alignment won't always help.  You'll notice that for the set at the start of a function, the padding starts where the original function would have were it properly aligned.  Since they are 5 bytes, this usually means that the end of the padding won't be on a good alignment boundary, and the L command frequently doesn't help.  If I really want them marked, I usually do it as 5 bytes of data.

These are function hooks, newer MS compilers insert them, I don't know what flags you have to add to get or not get them.  Basically, the 5 bytes is enough for a long JMP, and is a way for MS to throw in a hook for live patching.  The 2 bytes at the start of each function are equivalent to a NOP, which is enough for a JMP -5, which will get your long JMP to whereever you want.

You'll also often see the padding around internal parts of the function that have an SEH record pointing to them.

  bushing     July 24, 2006 18:23.27 CDT
> ryanlrussell:
> The alignment won\'t always help.  You\'ll notice that for the set at the start of a function, the padding starts where the original function would have were it properly aligned.  Since they are 5 bytes, this usually means that the end of the padding won\'t be on a good alignment boundary, and the L command frequently doesn\'t help.  If I really want them marked, I usually do it as 5 bytes of data.
>
> These are function hooks, [...]

Yuck.  Thanks for the explanation.

Let me take a couple of steps back, then. I'm trying to run IDA on a bloated 20+MB win32 .exe that was, in fact, apparently written in C++ and compiled by MSVC.

The problem: IDA misses a lot of the function starts, and even misses a lot of sections of code.  If I turn all autoanalysis options on and let IDA run to completion (60 mins +) then what I end up with is about 75% of the actual code being properly marked as such, with the rest being made into big data arrays.

The desired solution:  Somehow use these padding areas to signal to IDA that a function starts there.  It would also be nice to hide the padding blocks to make for a cleaner display.

Maybe there's a better way to solve that problem, then?
Ben

Note: Registration is required to post to the forums.

There are 31,328 total registered users.


Recently Created Topics
[help] Unpacking VMP...
Mar/12
Reverse Engineering ...
Jul/06
let 'IDAPython' impo...
Sep/24
set 'IDAPython' as t...
Sep/24
GuessType return une...
Sep/20
About retrieving the...
Sep/07
How to find specific...
Aug/15
How to get data depe...
Jul/07
Identify RVA data in...
May/06
Question about memor...
Dec/12


Recent Forum Posts
Finding the procedur...
rolEYder
Question about debbu...
rolEYder
Identify RVA data in...
sohlow
let 'IDAPython' impo...
sohlow
How to find specific...
hackgreti
Problem with ollydbg
sh3dow
How can I write olly...
sh3dow
New LoadMAP plugin v...
mefisto...
Intel pin in loaded ...
djnemo
OOP_RE tool available?
Bl4ckm4n


Recent Blog Entries
halsten
Mar/14
Breaking IonCUBE VM

oleavr
Oct/24
Anatomy of a code tracer

hasherezade
Sep/24
IAT Patcher - new tool for ...

oleavr
Aug/27
CryptoShark: code tracer ba...

oleavr
Jun/25
Build a debugger in 5 minutes

More ...


Recent Blog Comments
nieo on:
Mar/22
IAT Patcher - new tool for ...

djnemo on:
Nov/17
Kernel debugger vs user mod...

acel on:
Nov/14
Kernel debugger vs user mod...

pedram on:
Dec/21
frida.github.io: scriptable...

capadleman on:
Jun/19
Using NtCreateThreadEx for ...

More ...


Imagery
SoySauce Blueprint
Jun 6, 2008

[+] expand

View Gallery (11) / Submit