📚 OpenRCE is preserved as a read-only archive. Launched at RECon Montreal in 2005. Registration and posting are disabled.








Flag: Tornado! Hurricane!

 Forums >>  Brainstorms - General  >>  Data Type Analysis

Topic created on: November 13, 2006 05:51 CST by pokopoko .

Hi all,

I've just discovered this site and am finding a wealth of information here, it's fantastic.

I am interested in the process of analysing data types to aid in reverse engineering a binary. I'm not sure if there are already tools that can do this, but basically these are my thoughts..

When we are given a binary we can identify calls to dynamic library functions, and using a tool like IDA we can scan for known library functions that have been statically linked in.

This means that at any point in the program where such a function is called, we know information about the type of arguments and return value, whether they are int, char, struct xxx, etc.. By then by tracking how these variables are assigned to registers and memory, we can deduce or guess the type of data being processed at other points in the program.

I imagine that a tool could be used (with human interaction) to determine type information for very many unknown functions and variables, though both static analysis and by stepping through the code with a debugger.

What kind of tools are available to assist in this kind of analysis? I have played around with IDAPro a little, but I'm not sure how far it goes in this area - it seems limited but maybe I have missed something. Are there any projects in progress relating to type analysis?

P.S. I'm new to reverse engineering, so bear with me if this is a beginner question

  sp     November 14, 2006 01:49.08 CST
I'm not aware of any tool like this except for IDA itself (which AFAIK only infers types of parameters of known library functions). If you want to try to write your own tool there are at least two areas of research where you could get information from: Decompilation theory and type theory (specifically type inference).

In the first case you could check out decompiler sources, for example the source code of the Boomerang decompiler. One of the authors of Boomerang, QuantumG, has a blog where he talked about reconstructing type data from assembly code in the past.

If you want to learn more about the academic area of type inference I suggest looking around at Lambda - The Ultimate or Google. Useful keywords might be type system, type inference, Hindley-Milner, Typed Assembly Language (TAL) and so on. There's also a pretty cool book called "Types and Programming Languages" written by Benjamin C. Pierce.

  pokopoko     November 14, 2006 03:01.15 CST
Thanks sp for your very informative reply, I will take a look at those references. I will probably try to develop at least some sort of prototype tool to help with the current target I'm working on - I foresee that such a tool could be immensely useful.

As a first attempt, I am thinking to write something which uses pydbg to step through the code and trace flows of data as the program runs, i.e. use a dynamic approach rather than a static analysis.

If I have any success I'll let everyone know and be more than happy to contribute the code back to the community. :D

  slcoleman     November 14, 2006 12:15.08 CST
I have also been interested in this topic for quite some time. I have been looking at using someting like gccxml/pygccxml and pointing it at all the SDK's header files to glean much of the higher level information and then using a graph based dataflow model to propagate the information back whereever possible. Where it is not possible to trace the references one could take the hierarchy of known types, sizes, and offsets then analyze all the referenced offsets in the program to present the analyst with a selection list for a given struct or class instance. Once selected the data type propagation of any accessed sub-elements would begin again from there, until no more matches can be found. Thoughts?

  pokopoko     November 17, 2006 22:27.39 CST
Yes, that is very much along the lines of what I was thinking. I wasn't aware of gccxml before, it looks like it could be a very useful. I like the idea of analysing known structures to create a selection list - were you thinking of a purely static analysis of the program?

I was hoping to side step the issue of an untraceable reference (and static analysis in general) by limiting the tool initially to dynamic analysis - just analyzing those references which are dereferenced as the program runs. It's a simplistic model but I thought it would be a good first step.

What I imagine would be a very powerful tool is one which uses static analysis to propogate type information wherever possible, as you suggested, and dynamic analysis to follow references which can only be determined at runtime.

Note: Registration is required to post to the forums.

There are 31,328 total registered users.


Recently Created Topics
[help] Unpacking VMP...
Mar/12
Reverse Engineering ...
Jul/06
let 'IDAPython' impo...
Sep/24
set 'IDAPython' as t...
Sep/24
GuessType return une...
Sep/20
About retrieving the...
Sep/07
How to find specific...
Aug/15
How to get data depe...
Jul/07
Identify RVA data in...
May/06
Question about memor...
Dec/12


Recent Forum Posts
Finding the procedur...
rolEYder
Question about debbu...
rolEYder
Identify RVA data in...
sohlow
let 'IDAPython' impo...
sohlow
How to find specific...
hackgreti
Problem with ollydbg
sh3dow
How can I write olly...
sh3dow
New LoadMAP plugin v...
mefisto...
Intel pin in loaded ...
djnemo
OOP_RE tool available?
Bl4ckm4n


Recent Blog Entries
halsten
Mar/14
Breaking IonCUBE VM

oleavr
Oct/24
Anatomy of a code tracer

hasherezade
Sep/24
IAT Patcher - new tool for ...

oleavr
Aug/27
CryptoShark: code tracer ba...

oleavr
Jun/25
Build a debugger in 5 minutes

More ...


Recent Blog Comments
nieo on:
Mar/22
IAT Patcher - new tool for ...

djnemo on:
Nov/17
Kernel debugger vs user mod...

acel on:
Nov/14
Kernel debugger vs user mod...

pedram on:
Dec/21
frida.github.io: scriptable...

capadleman on:
Jun/19
Using NtCreateThreadEx for ...

More ...


Imagery
SoySauce Blueprint
Jun 6, 2008

[+] expand

View Gallery (11) / Submit