📚 OpenRCE is preserved as a read-only archive. Launched at RECon Montreal in 2005. Registration and posting are disabled.








Flag: Tornado! Hurricane!

Blogs >> tagetora's Blog

Created: Tuesday, January 23 2007 13:20.35 CST Modified: Tuesday, January 23 2007 13:51.26 CST
Direct Link, View / Make / Edit Comments
Function Search Engine (or whatever)
Author: tagetora # Views: 1598

Well, here is the first draft about how will work the "Function Search Engine". BTW, I'm looking for a name for this thing (something better that FSE or FUSEN). Okay, here we go!

Concept
--------------

The funtion matching will be using different levels of filters. I mean, there will be a group of first-level filters who will compound the core of the search engine. Also, the 2nd, 3rd... level filters will be used to "refine" the results. Every filter will be taking care of a function attribute, a set of attributes or will represent a relation between parts of the function (here is where we will face with some graph analysis and/or some other abstract representation of the function).

I build a first draft about what goes on every filter-level. It may be considered pre-alpha xD because some attributes will flow from one level to another, new attributes may appear and others may disapear, based in the tests done.

First level filters:
- Set of external references.
- Set of internal constant values.
- Stack size.
- Abstract graph representation

Second level filters:
- Number of parameters.
- Set of types (parameters and/or local variables).


Brief description of main filters
----------------------------------------------------

External References: The data used by the function (strings, constant values, data blocks such as a vector or a matrix, floating point numbers, etc.). The data a function uses says a lot about its functionality. Sometimes a simple Internet search save us a lot of time.

Constant values: Same thing as above but taking care of common used constants like 0xFFFFFFFF, 0x0, etc. that will include a lot of noise in the system. Also this references only constant values that are embeded in the function, I mean, they aren't outside the function's code.

Stack size: Just look at the function preamble (push-mov-sub). It will tell us about local variables size. I know that the same function can use static buffers or allocate them, I'm trying to get the best way to manage this.

Abstract graph: This is the "bindiff-like" graph representation of the function. It will be the hardest part of the implementation process, so we will focus on it once we have a good general desing. I've been talking with the radare developer about this, because he wants to add a graphical diff to his tool. Maybe we can get a base desing in a couple of weeks (I know, my time estimations are not quite accurate, but...)


Testing
--------------

Regarding the tests, at the begining we will be "good boys" and will use short and (of course) well known funcions, but that are very similar. Let's think in what makes different toupper() from tolower(), strcpy() and strncpy(). At the same time, trying that different implementations of these functions match each together.

For the final tests, it will be interesting to use the same function but compiled for different architectures (if the system finally supports it).


That's all folks!
-------------------------

As a final word, only talk about a concept I have been discussing with a friend. We were talking about the abstract graph representation and why not to use a "Divide and Conquer" desing, so we began to discuss about to use all this filters and techniques but with "basic function blocks" instead of taking the entire function. With this approach, we can ask ourselves: the definition of the basic blocks behaviour defines the global function behaviour? I think it's a nice question, but too hard to answer at this moment... or maybe not ;D

Using this "basic block" approach we face with new problems like how to define what's a "basic block", how this blocks change using different compiler flags, if we need to ignore certain types of blocks (like heap init or clean blocks), etc... Well, while I was writing this I see that Ero posted another interesting entry in his blog about "basic blocks" ;D


"I have promises to keep,
And miles to go before I sleep."

Created: Monday, December 18 2006 09:35.58 CST Modified: Monday, December 18 2006 14:02.11 CST
Direct Link, View / Make / Edit Comments
Disassembling in XML
Author: tagetora # Views: 1575

Talking with a friend about hown to obtain a good abstract respresentation of a piece of code, he came up an interesting tool that you can see in action just here. It seems that the tool was born as an advanced hexeditor, but now has a lot of reversing uses/utilities.

At first look, it seemed very strange to me how the intructions are represented in xml. In example, the xor has two source operands and a destination. But if we think that it's not limited to x86, then this fact makes more sense.

I still want to use IDA as a base framework for getting all this "function property abstraction" stuff, but maybe it's time to think about how to embed some external tools (a proxy.plw plugin? Mmm, interesting...). Well, that's all for now, I hope to have soon a first definition of how all this crazy thing will work, and maybe with a bit of "alpha" code to play with ;D

Created: Wednesday, December 13 2006 06:30.50 CST Modified: Wednesday, December 13 2006 08:38.15 CST
Direct Link, View / Make / Edit Comments
who needs backups?
Author: tagetora # Views: 1542

Well, a couple of weeks ago, I started a thread to receive some feedback on a complex topic: create an indexable database of known aka already-analyzed functions.

After that, no more news. Maybe some of you were thinking that I was giving up... well, almost true. Few days after my last post, my smartmon throwed an SMART warning saying "hey, your hdd has an expected lifetime of 0hours" �? 5min later, my root partition was remounted-ro due to read errors and then, just died (very "smart" the smartmon... sorry, this joke got to go).

The funniest thing is that my last backup was an old and dusty 650Mb CD with a few scripts, tools and reports. Of course I'm leaving the recovery mode status, but at very low speed ;D

Ok then, after this "here I am" stuff, only to say that I will try to keep this blog updated with thoughts about the function database idea, collecting interesting concepts, suggestions, bibliography, etc.

That's all folks!


Archived Entries for tagetora
Subject # Views Created On
No archived blog entries found.

There are 31,328 total registered users.


Recently Created Topics
[help] Unpacking VMP...
Mar/12
Reverse Engineering ...
Jul/06
let 'IDAPython' impo...
Sep/24
set 'IDAPython' as t...
Sep/24
GuessType return une...
Sep/20
About retrieving the...
Sep/07
How to find specific...
Aug/15
How to get data depe...
Jul/07
Identify RVA data in...
May/06
Question about memor...
Dec/12


Recent Forum Posts
Finding the procedur...
rolEYder
Question about debbu...
rolEYder
Identify RVA data in...
sohlow
let 'IDAPython' impo...
sohlow
How to find specific...
hackgreti
Problem with ollydbg
sh3dow
How can I write olly...
sh3dow
New LoadMAP plugin v...
mefisto...
Intel pin in loaded ...
djnemo
OOP_RE tool available?
Bl4ckm4n


Recent Blog Entries
halsten
Mar/14
Breaking IonCUBE VM

oleavr
Oct/24
Anatomy of a code tracer

hasherezade
Sep/24
IAT Patcher - new tool for ...

oleavr
Aug/27
CryptoShark: code tracer ba...

oleavr
Jun/25
Build a debugger in 5 minutes

More ...


Recent Blog Comments
nieo on:
Mar/22
IAT Patcher - new tool for ...

djnemo on:
Nov/17
Kernel debugger vs user mod...

acel on:
Nov/14
Kernel debugger vs user mod...

pedram on:
Dec/21
frida.github.io: scriptable...

capadleman on:
Jun/19
Using NtCreateThreadEx for ...

More ...


Imagery
SoySauce Blueprint
Jun 6, 2008

[+] expand

View Gallery (11) / Submit