OpenRCE

About Articles Book Store Distributed RCE Downloads Event Calendar Forums Live Discussion Reference Library RSS Feeds Search Users What's New

Customize Theme

Flag: Tornado! Hurricane!

Created: Friday, October 9 2009 22:28.46 CDT

Modified: Saturday, October 10 2009 05:24.10 CDT

Printer Friendly ...

Structure Recovery as Counter-Example Guided Abstraction Refinement

Author: RolfRolles

# Views: 9589

Counter-Example Guided Abstraction Refinement (C.E.G.A.R. for short) is one of those brilliant techniques, like abstract interpretation, that is simple to state, and that in principle applies beyond its original application domain (in this case, model checking).  As this blog entry is meant to be an informal exposition of an idea I had, I feel that it is not warranted to set up the necessary formalism to discuss C.E.G.A.R. in precise mathematical terms, but it would be wrong to convey the notion that it is anything other than a mathematical endeavor.

When you're trying to figure something out in a black-box fashion, you probably do something like the following.  You begin by inspecting the data at hand and constructing some sort of model (an "abstraction") that fits it.  Then you acquire more data, and test your model against it.  If new data causes the model to be falsified (that particular data is called a "counterexample"), you must "refine" your model so as to account for it.  This process is iterated until no more counterexamples are found.  Basically, this is C.E.G.A.R. -- you can see that the idea enjoys widespread genericity and more or less applies to any scientific endeavor.  Notice I didn't talk about software or anything in particular in this paragraph; I'm just talking about thinking.

To be more specific, C.E.G.A.R. is an idea in model checking that is very similar to the preceeding paragraph.  Basically, a model (abstraction) is constructed automatically, and standard model-checking techniques are used to verify that the model satifies the specification which has been provided.  If it does not, a counterexample is generated.  You then analyze the original hardware or software system to see if the counterexample falsifies the specification.  If it does, the counterexample is a genuine witness to the fact that the system does not adhere to the specification.  If it does not, then we have a "spurious" counterexample (an indication that our abstraction is deficient), which we use to refine our model accordingly, and then we repeat the process.

Once I learned about C.E.G.A.R., I realized that I already adopted a very similar technique while manually reverse engineering object-oriented applications.  I begin by assuming that there are not any structures in the program at all.  When I encounter an instruction of the form mov reg, [reg+offset], I define a new structure consisting solely of that member, with any bytes below or above that member being considered as undefined.  I then track the structure pointer and repeatedly add members every time I encounter a reference of that form.

In terms of C.E.G.A.R.:

1:  Begin with an empty structure as the model.
2:  Assert that the structure is a valid model of the program's memory accesses with respect to the structure pointer.  Model-check the program under this assumption.
3:  If a reference to an as-yet undefined structure member is found, this is a counter-example which is a witness to the falsity of the notion that the model accurately depicts the structure under consideration.  Therefore, refine the model to incorporate the new structure member, and go to 2.
4:  Output the structure.

I don't mean to imply that this is the way that an automated structure recovery tool should be constructed; I think that an interprocedural version of what Hex-Rays does with global dataflow analysis ought to be sufficient for most purposes.  I'm merely suggesting a mathematically precise way to look at the problem of structure recovery.

Perhaps I'll post an entry about viewing the same problem in terms of the Knaster-Tarski theorem in order theory.

Blog Comments

lallous	Posted: Saturday, October 10 2009 05:29.07 CDT
Thanks for sharing Rolf. This technique is really intuitive, and as you said I also found myself doing the same while reconstructing structures.

fileoffset	Posted: Sunday, October 11 2009 04:48.07 CDT
I generally just take a glance at the disassembly and write the structure out in full but this will work too...

melajara	Posted: Sunday, October 11 2009 05:22.34 CDT
This reminds me of Popper "Conjectures and refutations", a seminal work in the philosophy of science. Popper elaborated this 90 years ago ;-) See e.g. http://www.leonardbeeghley.com/docs/SYG%206125/Popper,%20Science-conjectures%20&%20refutations.pdf As he said "the criterion of the scientific status of a theory is its falsifiability, or refutability, or testability". Sounds familiar? Cheers

There are 31,325 total registered users.

Recently Created Topics
	Oct/23
	Oct/23
	Oct/23
	Oct/23
	Oct/23
	Oct/23
	Oct/23
	Oct/23
	Oct/23
	Oct/23

Recent Forum Posts
Reverse Engineering ...	bytecod3r
Reverse Engineering ...	bytecod3r
Reverse Engineering ...	bytecod3r
Reverse Engineering ...	bytecod3r
Reverse Engineering ...	bytecod3r
let 'IDAPython' impo...	bytecod3r
Reverse Engineering ...	bytecod3r
Finding the procedur...	rolEYder
Question about debbu...	rolEYder
Identify RVA data in...	sohlow

Recent Blog Entries
	halsten	Mar/14
Breaking IonCUBE VM
	oleavr	Oct/24
Anatomy of a code tracer
	hasherezade	Sep/24
IAT Patcher - new tool for ...
	oleavr	Aug/27
CryptoShark: code tracer ba...
	oleavr	Jun/25
Build a debugger in 5 minutes
More ...

Recent Blog Comments
	nieo on:	Mar/22
IAT Patcher - new tool for ...
	djnemo on:	Nov/17
Kernel debugger vs user mod...
	acel on:	Nov/14
Kernel debugger vs user mod...
	pedram on:	Dec/21
frida.github.io: scriptable...
	capadleman on:	Jun/19
Using NtCreateThreadEx for ...
More ...

Imagery

SoySauce Blueprint
Jun 6, 2008

[+] expand

View Gallery (11) / Submit