Flag: Tornado! Hurricane!

Blogs >> RolfRolles's Blog

Created: Friday, October 9 2009 22:28.46 CDT Modified: Saturday, October 10 2009 05:24.10 CDT
Printer Friendly ...
Structure Recovery as Counter-Example Guided Abstraction Refinement
Author: RolfRolles # Views: 9313

Counter-Example Guided Abstraction Refinement (C.E.G.A.R. for short) is one of those brilliant techniques, like abstract interpretation, that is simple to state, and that in principle applies beyond its original application domain (in this case, model checking).  As this blog entry is meant to be an informal exposition of an idea I had, I feel that it is not warranted to set up the necessary formalism to discuss C.E.G.A.R. in precise mathematical terms, but it would be wrong to convey the notion that it is anything other than a mathematical endeavor.

When you're trying to figure something out in a black-box fashion, you probably do something like the following.  You begin by inspecting the data at hand and constructing some sort of model (an "abstraction") that fits it.  Then you acquire more data, and test your model against it.  If new data causes the model to be falsified (that particular data is called a "counterexample"), you must "refine" your model so as to account for it.  This process is iterated until no more counterexamples are found.  Basically, this is C.E.G.A.R. -- you can see that the idea enjoys widespread genericity and more or less applies to any scientific endeavor.  Notice I didn't talk about software or anything in particular in this paragraph; I'm just talking about thinking.

To be more specific, C.E.G.A.R. is an idea in model checking that is very similar to the preceeding paragraph.  Basically, a model (abstraction) is constructed automatically, and standard model-checking techniques are used to verify that the model satifies the specification which has been provided.  If it does not, a counterexample is generated.  You then analyze the original hardware or software system to see if the counterexample falsifies the specification.  If it does, the counterexample is a genuine witness to the fact that the system does not adhere to the specification.  If it does not, then we have a "spurious" counterexample (an indication that our abstraction is deficient), which we use to refine our model accordingly, and then we repeat the process.

Once I learned about C.E.G.A.R., I realized that I already adopted a very similar technique while manually reverse engineering object-oriented applications.  I begin by assuming that there are not any structures in the program at all.  When I encounter an instruction of the form mov reg, [reg+offset], I define a new structure consisting solely of that member, with any bytes below or above that member being considered as undefined.  I then track the structure pointer and repeatedly add members every time I encounter a reference of that form.

In terms of C.E.G.A.R.:

1:  Begin with an empty structure as the model.
2:  Assert that the structure is a valid model of the program's memory accesses with respect to the structure pointer.  Model-check the program under this assumption.
3:  If a reference to an as-yet undefined structure member is found, this is a counter-example which is a witness to the falsity of the notion that the model accurately depicts the structure under consideration.  Therefore, refine the model to incorporate the new structure member, and go to 2.
4:  Output the structure.

I don't mean to imply that this is the way that an automated structure recovery tool should be constructed; I think that an interprocedural version of what Hex-Rays does with global dataflow analysis ought to be sufficient for most purposes.  I'm merely suggesting a mathematically precise way to look at the problem of structure recovery.

Perhaps I'll post an entry about viewing the same problem in terms of the Knaster-Tarski theorem in order theory.


Blog Comments
lallous Posted: Saturday, October 10 2009 05:29.07 CDT
Thanks for sharing Rolf.

This technique is really intuitive, and as you said I also found myself doing the same while reconstructing structures.

fileoffset Posted: Sunday, October 11 2009 04:48.07 CDT
I generally just take a glance at the disassembly and write the structure out in full but this will work too...

melajara Posted: Sunday, October 11 2009 05:22.34 CDT
This reminds me of Popper "Conjectures and refutations", a seminal work in the philosophy of science.

Popper elaborated this 90 years ago ;-)

See e.g. http://www.leonardbeeghley.com/docs/SYG%206125/Popper,%20Science-conjectures%20&%20refutations.pdf

As he said "the criterion of the scientific status of a theory is its falsifiability, or refutability, or testability".

Sounds familiar?

Cheers



Add New Comment
Comment:









There are 31,310 total registered users.


Recently Created Topics
[help] Unpacking VMP...
Mar/12
Reverse Engineering ...
Jul/06
hi!
Jul/01
let 'IDAPython' impo...
Sep/24
set 'IDAPython' as t...
Sep/24
GuessType return une...
Sep/20
About retrieving the...
Sep/07
How to find specific...
Aug/15
How to get data depe...
Jul/07
Identify RVA data in...
May/06


Recent Forum Posts
Finding the procedur...
rolEYder
Question about debbu...
rolEYder
Identify RVA data in...
sohlow
let 'IDAPython' impo...
sohlow
How to find specific...
hackgreti
Problem with ollydbg
sh3dow
How can I write olly...
sh3dow
New LoadMAP plugin v...
mefisto...
Intel pin in loaded ...
djnemo
OOP_RE tool available?
Bl4ckm4n


Recent Blog Entries
halsten
Mar/14
Breaking IonCUBE VM

oleavr
Oct/24
Anatomy of a code tracer

hasherezade
Sep/24
IAT Patcher - new tool for ...

oleavr
Aug/27
CryptoShark: code tracer ba...

oleavr
Jun/25
Build a debugger in 5 minutes

More ...


Recent Blog Comments
nieo on:
Mar/22
IAT Patcher - new tool for ...

djnemo on:
Nov/17
Kernel debugger vs user mod...

acel on:
Nov/14
Kernel debugger vs user mod...

pedram on:
Dec/21
frida.github.io: scriptable...

capadleman on:
Jun/19
Using NtCreateThreadEx for ...

More ...


Imagery
SoySauce Blueprint
Jun 6, 2008

[+] expand

View Gallery (11) / Submit