📚 OpenRCE is preserved as a read-only archive. Launched at RECon Montreal in 2005. Registration and posting are disabled.








Flag: Tornado! Hurricane!

Blogs >> codypierce's Blog

Created: Friday, January 19 2007 00:40.35 CST Modified: Friday, January 19 2007 11:07.06 CST
Printer Friendly ...
Binary Instruction Word Clouds
Author: codypierce # Views: 4006

I have been working on an x86 emulator in python recently and before starting I did some research into just how many instructions in the hundreds are actually used in a real world binary.  The results weren't surprising in that only a handful are *really* used.  Id say 30 or so are used 80% of the time.  With that in mind I thought it would be interesting to use the popular "word cloud" data representation to display those instruction.  The word cloud is simple in that the more occurrences get a heavier weight (font).  Since this blog wont let me add the real page here is an image.

XP SP2 kernel32.dll (961K)



Click here for page that includes counts

XP SP2 shell32.dll (8256K)



Click here for page that includes counts

Kind of a novel idea.  I suppose you could also do something like represent heap chunks by address and weight them based on access, or windows API calls and use.


Blog Comments
sp Posted: Friday, January 19 2007 01:54.12 CST
It's only tangentially related but at Black Hat USA 2006 Daniel Bilar held a presentation where he tried to distinguish between malware and regular programs using the assembly instruction distribution of files (among other things). There are lots of opcode distribution stats in his slides.

http://www.blackhat.com/presentations/bh-usa-06/BH-US-06-Bilar.pdf

Piotr Posted: Friday, January 19 2007 07:08.20 CST
Well its an old topic, if you mean opcode frequency statistics. Even virus writers attempted to research this topic few years ago (f.e Z0mbie - Opcode Frequency Statistics, - although MISTFALL had a limited disassembler).

Moreover, a lot of AV engines already includes opcode frequency statistics as part of heuristics. For example engine automaticly switch to deep-scan mode if it will find some suspicious instructions like sti/cli... bla bla (of course assume we are r3 in this example).

cheers.

otto Posted: Friday, January 19 2007 07:35.38 CST
It is instresting that there are 4-8 times as much calls as there are RETNs. Are there other causes for this than imported functions?

sp Posted: Friday, January 19 2007 07:38.44 CST
otto: A function can be called from multiple locations but most functions compiled by regular compilers have just one ret instruction (following the single entry, single exit principle).

call foo
call foo
call foo

proc foo:
ret

3 times more calls than rets. :)

otto Posted: Friday, January 19 2007 07:45.31 CST
What was I thinking! Of course that's the reason. I guess I thought they were some run-time statics (e.g. from an emulator).  *shame on me*

Piotr Posted: Friday, January 19 2007 07:47.00 CST
And in case u havent seen, some opcode frequency counter from my Aslan project: http://piotrbania.com/all/4514N/a1.jpg.

Note: there are few different variation forms of same instruction, so they are splitted - leaving the same name couple of times, one stands for one form.


pedram Posted: Friday, January 19 2007 10:42.53 CST
Piotr: Well its an old topic, if you mean opcode frequency statistics. Even virus writers attempted to research this topic few years ago (f.e Z0mbie - Opcode Frequency Statistics, - although MISTFALL had a limited disassembler).

I don't think he meant that generally analyzing opcode frequencies is new/cool just the way of showing it with the HTML word cloud.

Piotr Posted: Friday, January 19 2007 10:49.05 CST
Pedram:
Ah well, but i still prefer suns.

Sellmi Posted: Saturday, January 20 2007 03:17.49 CST
@codypierce

this representation  is a wonderful idea ;)




stam321 Posted: Sunday, January 21 2007 03:52.03 CST
About the emulation,
How many instructions per second can it handle?



Add New Comment
Comment:









There are 31,328 total registered users.


Recently Created Topics
[help] Unpacking VMP...
Mar/12
Reverse Engineering ...
Jul/06
let 'IDAPython' impo...
Sep/24
set 'IDAPython' as t...
Sep/24
GuessType return une...
Sep/20
About retrieving the...
Sep/07
How to find specific...
Aug/15
How to get data depe...
Jul/07
Identify RVA data in...
May/06
Question about memor...
Dec/12


Recent Forum Posts
Finding the procedur...
rolEYder
Question about debbu...
rolEYder
Identify RVA data in...
sohlow
let 'IDAPython' impo...
sohlow
How to find specific...
hackgreti
Problem with ollydbg
sh3dow
How can I write olly...
sh3dow
New LoadMAP plugin v...
mefisto...
Intel pin in loaded ...
djnemo
OOP_RE tool available?
Bl4ckm4n


Recent Blog Entries
halsten
Mar/14
Breaking IonCUBE VM

oleavr
Oct/24
Anatomy of a code tracer

hasherezade
Sep/24
IAT Patcher - new tool for ...

oleavr
Aug/27
CryptoShark: code tracer ba...

oleavr
Jun/25
Build a debugger in 5 minutes

More ...


Recent Blog Comments
nieo on:
Mar/22
IAT Patcher - new tool for ...

djnemo on:
Nov/17
Kernel debugger vs user mod...

acel on:
Nov/14
Kernel debugger vs user mod...

pedram on:
Dec/21
frida.github.io: scriptable...

capadleman on:
Jun/19
Using NtCreateThreadEx for ...

More ...


Imagery
SoySauce Blueprint
Jun 6, 2008

[+] expand

View Gallery (11) / Submit