📚 OpenRCE is preserved as a read-only archive. Launched at RECon Montreal in 2005. Registration and posting are disabled.








Flag: Tornado! Hurricane!

Blogs >> ero's Blog

Created: Saturday, May 12 2007 15:54.00 CDT Modified: Sunday, May 13 2007 15:27.47 CDT
This is an imported entry. View original. Printer Friendly ...
Scanning data for entropy anomalies
Author: ero # Views: 2265

l0re just asked the following question in the OpenRCE forums:

Im currently searching for a tool that does an entropy analyse. I want it to use it for finding a RSA key in a binary file. I have seen a tool that could do this on a workshop but unfortunately I dont know the name of tool and I cant find it with help of google. Does any one know the name of the tool or a tool that could do this?

Im dont know of such tool from the top of my head although PEiD and OllyDBG both do statistical tests in order to detect possibly compressed/packed executables.

But having to come up with such things is one of the reasons why I love Python and Mathematica+Pythonika. With both its possible to put together, in a few minutes the desired functionality.

So, the idea is to spot the typical high entropy that should be exhibited by something like a RSA key stored in binary form. Assuming that its stored within data with significantly lower entropy, such as a standard executable file (that is, not packed or compressed itself), it should be easy to spot visually. Lets check...

First we need a function that calculates the entropy of a given chunk of data. The following code will take a Python string and calculate its byte entropy, returning a real number in the range 0.0 and 8.0.
Values close to 8.0 would indicate a high entropy, hence the likelihood of compressed or otherwise highly random data. Low values would indicate low complexity data such as text or executable instructions or any other data exhibiting clear patterns.


import math

def H(data):
  if not data:
    return 0
  entropy = 0
  for x in range(256):
    p_x = float(data.count(chr(x)))/len(data)
    if p_x > 0:
      entropy += - p_x*math.log(p_x, 2)
  return entropy



Next we want to be able to take a chunk of data and run the entropy calculation function all across it, on byte increments, with a defined block size. Starting from the byte at offset 0, we will calculate the entropy of each data chunk of the given size and return its value. The function is an iterator so that we can easily get a list of entropies for all offsets that we can next feed into a plotting function.


def entropy_scan (data, block_size) :
  for block in (
    data[x:block_size+x]
    for x in range (len (data) - block_size) ):

    yield H (block)



Now we need some test data, the following code will generate a low-entropy chunk of data 1024 bytes long, followed by a high-entropy one (assuming the random generator is good enough, which is the case for the example) also 1024 bytes long and closing with 1024 bytes more of low entropy data.


data = .join (
  [chr (random.randint (0, 64)) for x in xrange (1024)] +
  [chr (random.randint (0, 255)) for x in xrange (1024)] +
  [chr (random.randint (0, 64)) for x in xrange (1024)] )



If we run the Python code within Mathematica


ListPlot[ Py["<
  list(
    entropy_scan( data, 256 ) )
>"] ]


we obtain the following plot



displaying a noticeable bump in the region where the higher entropy data lies within our test data.

Update:

Deadhacker has posted an augmented version of my hack that does not rely on Mathematica in addition of being able to run on arbitrary files passed as arguments to his script.


If you wish to comment on this blog entry, please do so on the original site it was imported from.

There are 31,328 total registered users.


Recently Created Topics
[help] Unpacking VMP...
Mar/12
Reverse Engineering ...
Jul/06
let 'IDAPython' impo...
Sep/24
set 'IDAPython' as t...
Sep/24
GuessType return une...
Sep/20
About retrieving the...
Sep/07
How to find specific...
Aug/15
How to get data depe...
Jul/07
Identify RVA data in...
May/06
Question about memor...
Dec/12


Recent Forum Posts
Finding the procedur...
rolEYder
Question about debbu...
rolEYder
Identify RVA data in...
sohlow
let 'IDAPython' impo...
sohlow
How to find specific...
hackgreti
Problem with ollydbg
sh3dow
How can I write olly...
sh3dow
New LoadMAP plugin v...
mefisto...
Intel pin in loaded ...
djnemo
OOP_RE tool available?
Bl4ckm4n


Recent Blog Entries
halsten
Mar/14
Breaking IonCUBE VM

oleavr
Oct/24
Anatomy of a code tracer

hasherezade
Sep/24
IAT Patcher - new tool for ...

oleavr
Aug/27
CryptoShark: code tracer ba...

oleavr
Jun/25
Build a debugger in 5 minutes

More ...


Recent Blog Comments
nieo on:
Mar/22
IAT Patcher - new tool for ...

djnemo on:
Nov/17
Kernel debugger vs user mod...

acel on:
Nov/14
Kernel debugger vs user mod...

pedram on:
Dec/21
frida.github.io: scriptable...

capadleman on:
Jun/19
Using NtCreateThreadEx for ...

More ...


Imagery
SoySauce Blueprint
Jun 6, 2008

[+] expand

View Gallery (11) / Submit