Created: Friday, October 9 2009 22:28.46 CDT
Modified: Saturday, October 10 2009 05:24.10 CDT
Printer Friendly ...
Structure Recovery as Counter-Example Guided Abstraction Refinement
Counter-Example Guided Abstraction Refinement
(C.E.G.A.R. for short) is one of those brilliant techniques, like abstract interpretation, that is simple to state, and that in principle applies beyond its original application domain (in this case, model checking). As this blog entry is meant to be an informal exposition of an idea I had, I feel that it is not warranted to set up the necessary formalism to discuss C.E.G.A.R. in precise mathematical terms, but it would be wrong to convey the notion that it is anything other than a mathematical endeavor.
When you're trying to figure something out in a black-box fashion, you probably do something like the following. You begin by inspecting the data at hand and constructing some sort of model (an "abstraction") that fits it. Then you acquire more data, and test your model against it. If new data causes the model to be falsified (that particular data is called a "counterexample"), you must "refine" your model so as to account for it. This process is iterated until no more counterexamples are found. Basically, this is C.E.G.A.R. -- you can see that the idea enjoys widespread genericity and more or less applies to any scientific endeavor. Notice I didn't talk about software or anything in particular in this paragraph; I'm just talking about thinking.
To be more specific, C.E.G.A.R. is an idea in model checking that is very similar to the preceeding paragraph. Basically, a model (abstraction) is constructed automatically, and standard model-checking techniques are used to verify that the model satifies the specification which has been provided. If it does not, a counterexample is generated. You then analyze the original hardware or software system to see if the counterexample falsifies the specification. If it does, the counterexample is a genuine witness to the fact that the system does not adhere to the specification. If it does not, then we have a "spurious" counterexample (an indication that our abstraction is deficient), which we use to refine our model accordingly, and then we repeat the process.
Once I learned about C.E.G.A.R., I realized that I already adopted a very similar technique while manually reverse engineering object-oriented applications. I begin by assuming that there are not any structures in the program at all. When I encounter an instruction of the form mov reg, [reg+offset], I define a new structure consisting solely of that member, with any bytes below or above that member being considered as undefined. I then track the structure pointer and repeatedly add members every time I encounter a reference of that form.
In terms of C.E.G.A.R.:
1: Begin with an empty structure as the model.
2: Assert that the structure is a valid model of the program's memory accesses with respect to the structure pointer. Model-check the program under this assumption.
3: If a reference to an as-yet undefined structure member is found, this is a counter-example which is a witness to the falsity of the notion that the model accurately depicts the structure under consideration. Therefore, refine the model to incorporate the new structure member, and go to 2.
4: Output the structure.
I don't mean to imply that this is the way that an automated structure recovery tool should be constructed; I think that an interprocedural version of
what Hex-Rays does
with global dataflow analysis ought to be sufficient for most purposes. I'm merely suggesting a mathematically precise way to look at the problem of structure recovery.
Perhaps I'll post an entry about viewing the same problem in terms of the Knaster-Tarski theorem in order theory.
Posted: Saturday, October 10 2009 05:29.07 CDT
Thanks for sharing Rolf.
This technique is really intuitive, and as you said I also found myself doing the same while reconstructing structures.
Posted: Sunday, October 11 2009 04:48.07 CDT
I generally just take a glance at the disassembly and write the structure out in full but this will work too...
Posted: Sunday, October 11 2009 05:22.34 CDT
This reminds me of Popper "Conjectures and refutations", a seminal work in the philosophy of science.
Popper elaborated this 90 years ago ;-)
See e.g. http://www.leonardbeeghley.com/docs/SYG%206125/Popper,%20Science-conjectures%20&%20refutations.pdf
As he said "the criterion of the scientific status of a theory is its falsifiability, or refutability, or testability".
Add New Comment
total registered users.
Recently Created Topics
How to decompile a f...
How to trap mouse cl...
Intel pin in loaded ...
Going to do today wi...
how to create delphi...
enabling menu in a s...
How to get the Image...
OllyDBG Process Term...
Question about debbu...
Recent Forum Posts
Intel pin in loaded ...
OOP_RE tool available?
OOP_RE tool available?
Should binaries be n...
Problem with ollydbg
looking for a softwa...
.orpc section what's...
Pydbg load() issue
Recent Blog Entries
CryptoShark: code tracer ba...
Build a debugger in 5 minutes
frida.re 1.2.0 is out, with...
Android Malware Analysis
Recent Blog Comments
Using NtCreateThreadEx for ...
Branch tracing and LBR acce...
Advanced debugging techniques
2 anti-trace mechanisms spe...
Jun 6, 2008