📚
OpenRCE
is preserved as a read-only archive. Launched at RECon Montreal in 2005. Registration and posting are disabled.
About
Articles
Book Store
Distributed RCE
Downloads
Event Calendar
Forums
Live Discussion
Reference Library
RSS Feeds
Search
Users
What's New
Customize Theme
bluegrey
blackgreen
metal
simple
Flag:
Tornado!
Hurricane!
Login:
Password:
Remember Me
Register
Blogs
>>
ero
's Blog
Created: Friday, December 8 2006 16:38.00 CST
Modified: Sunday, January 21 2007 15:07.01 CST
This is an imported entry.
View original
.
Printer Friendly ...
Simply blocks, basically...
Author:
ero
# Views:
1895
A few days ago I bumped into something I was not really counting on seeing. Compiler
optimizations
have surely gone a long way.
Some background first...
After having seeing functions split over non-contiguous
basic blocks
for some time now, it was quite natural to think that some of those basic blocks could be shared among functions ( obviously, only the ones leading to the function exit points as once the shared code is reached, theres no way of getting flow back to basic blocks not shared by those functions).
Then we have that functions can be split with their blocks in different parts of a binary and some of those blocks shared. The reason for the splitting comes from doing
profiling
in normal use-cases of the applications and trying to group frequently accessed code into as few
pages
of the executable as possible, so that a minimum set of those need to be mapped at one time in memory. Only when infrequently visited code is reached some new pages new to be mapped. The following figures illustrates the concept.
UPDATE:
Just got told that the reason for the splitting is more likely to be there to take advantage of the internal CPU instruction cache than of memory paging. Keeping the frequently traversed code together will result in less instructions being fetched from RAM (slower) for that code area. Also will allow to fit more code in the code-cache by moving away the less used blocks.
Here we can see the blocks being laid out continuously in memory. As can be normally seen in non-optimized code.
This would be how the same function would be laid out if profiling information is incorporated, so that frequently traversed paths are together within the code (in the same memory page if possible, in order to reduce memory footprint and paging).
Once one has the splitting, the idea of sharing comes naturally.
This results in that, from the disassembler point of view, one has to allow for those chunks and also for those chunks to be assigned to an arbitrary number of "owning" or parent functions.
What is more interesting, and the subject of this post, is the fact that instructions can also belong to different basic blocks. At least, under one view. This arises from cases where extensive optimizations are used.
A couple of days ago I was looking into an optimized binary (the craziest I have seen in a while) and how it was mapping into the SQL representation we are using at
Sabre
, there were some problems when exporting the information from IDA. (IDA cant really handle too well (yet) heavily-"chunked" code, so I have to account for that and build intelligence that analyzes the code for cases like the one Im discussing here)
The problem was with two functions sharing a number of basic blocks, the funny side was that, depending which function one analyzes the flow among the shared blocks will look different. And the cause is fairly obvious too once one realizes why the problem appears.
A conditional branch from the non-shared code in one of the functions targeting the shared code will cause a split in the flow. A split which is not present from the other functions point of view. The following figure shows the result of a branch into shared code from only one of the sharing functions.
There are two solutions for this problem. One would be to represent the same basic blocks all over the binary, which would introduce a non-natural spit in a function, the other way would be allowing to have different "views" of the code, using the basic blocks simply as a representation of the underlying model (the disassembled instructions), so that different basic blocks would contain the same instructions and those basic blocks would accurately represent the flow in the two functions...
In the next figure, the colored basic blocks contain the same instructions in both functions, but the flow is different because of the branching.
Im leaning towards the second approach (the one in the previous figure), our SQL schema should support it trivially, which is fairly neat.
If you wish to comment on this blog entry, please do so on the
original site
it was imported from.
There are
31,328
total registered users.
Recently Created Topics
[help] Unpacking VMP...
Mar/12
Reverse Engineering ...
Jul/06
let 'IDAPython' impo...
Sep/24
set 'IDAPython' as t...
Sep/24
GuessType return une...
Sep/20
About retrieving the...
Sep/07
How to find specific...
Aug/15
How to get data depe...
Jul/07
Identify RVA data in...
May/06
Question about memor...
Dec/12
Recent Forum Posts
Finding the procedur...
rolEYder
Question about debbu...
rolEYder
Identify RVA data in...
sohlow
let 'IDAPython' impo...
sohlow
How to find specific...
hackgreti
Problem with ollydbg
sh3dow
How can I write olly...
sh3dow
New LoadMAP plugin v...
mefisto...
Intel pin in loaded ...
djnemo
OOP_RE tool available?
Bl4ckm4n
Recent Blog Entries
halsten
Mar/14
Breaking IonCUBE VM
oleavr
Oct/24
Anatomy of a code tracer
hasherezade
Sep/24
IAT Patcher - new tool for ...
oleavr
Aug/27
CryptoShark: code tracer ba...
oleavr
Jun/25
Build a debugger in 5 minutes
More ...
Recent Blog Comments
nieo
on:
Mar/22
IAT Patcher - new tool for ...
djnemo
on:
Nov/17
Kernel debugger vs user mod...
acel
on:
Nov/14
Kernel debugger vs user mod...
pedram
on:
Dec/21
frida.github.io: scriptable...
capadleman
on:
Jun/19
Using NtCreateThreadEx for ...
More ...
Imagery
SoySauce Blueprint
Jun 6, 2008
[+] expand
View Gallery
(11) /
Submit