Function Search Engine (or whatever)
Tora (tagetora) <tagetoragmailcom> Tuesday, January 23 2007 13:20.35 CST


Well, here is the first draft about how will work the "Function Search Engine". BTW, I'm looking for a name for this thing (something better that FSE or FUSEN). Okay, here we go!

Concept
--------------

The funtion matching will be using different levels of filters. I mean, there will be a group of first-level filters who will compound the core of the search engine. Also, the 2nd, 3rd... level filters will be used to "refine" the results. Every filter will be taking care of a function attribute, a set of attributes or will represent a relation between parts of the function (here is where we will face with some graph analysis and/or some other abstract representation of the function).

I build a first draft about what goes on every filter-level. It may be considered pre-alpha xD because some attributes will flow from one level to another, new attributes may appear and others may disapear, based in the tests done.

First level filters:
- Set of external references.
- Set of internal constant values.
- Stack size.
- Abstract graph representation

Second level filters:
- Number of parameters.
- Set of types (parameters and/or local variables).


Brief description of main filters
----------------------------------------------------

External References: The data used by the function (strings, constant values, data blocks such as a vector or a matrix, floating point numbers, etc.). The data a function uses says a lot about its functionality. Sometimes a simple Internet search save us a lot of time.

Constant values: Same thing as above but taking care of common used constants like 0xFFFFFFFF, 0x0, etc. that will include a lot of noise in the system. Also this references only constant values that are embeded in the function, I mean, they aren't outside the function's code.

Stack size: Just look at the function preamble (push-mov-sub). It will tell us about local variables size. I know that the same function can use static buffers or allocate them, I'm trying to get the best way to manage this.

Abstract graph: This is the "bindiff-like" graph representation of the function. It will be the hardest part of the implementation process, so we will focus on it once we have a good general desing. I've been talking with the radare developer about this, because he wants to add a graphical diff to his tool. Maybe we can get a base desing in a couple of weeks (I know, my time estimations are not quite accurate, but...)


Testing
--------------

Regarding the tests, at the begining we will be "good boys" and will use short and (of course) well known funcions, but that are very similar. Let's think in what makes different toupper() from tolower(), strcpy() and strncpy(). At the same time, trying that different implementations of these functions match each together.

For the final tests, it will be interesting to use the same function but compiled for different architectures (if the system finally supports it).


That's all folks!
-------------------------

As a final word, only talk about a concept I have been discussing with a friend. We were talking about the abstract graph representation and why not to use a "Divide and Conquer" desing, so we began to discuss about to use all this filters and techniques but with "basic function blocks" instead of taking the entire function. With this approach, we can ask ourselves: the definition of the basic blocks behaviour defines the global function behaviour? I think it's a nice question, but too hard to answer at this moment... or maybe not ;D

Using this "basic block" approach we face with new problems like how to define what's a "basic block", how this blocks change using different compiler flags, if we need to ignore certain types of blocks (like heap init or clean blocks), etc... Well, while I was writing this I see that Ero posted another interesting entry in his blog about "basic blocks" ;D


"I have promises to keep,
And miles to go before I sleep."

Comments
Posted: Wednesday, December 31 1969 18:00.00 CST