OpenRCE Blog Entry

Ero Carrera (ero) <ero

carrera

gmail

com>

Saturday, May 12 2007 15:54.00 CDT

l0re just asked the following question in the OpenRCE forums:

Im currently searching for a tool that does an entropy analyse. I want it to use it for finding a RSA key in a binary file. I have seen a tool that could do this on a workshop but unfortunately I dont know the name of tool and I cant find it with help of google. Does any one know the name of the tool or a tool that could do this?

Im dont know of such tool from the top of my head although PEiD and OllyDBG both do statistical tests in order to detect possibly compressed/packed executables.

But having to come up with such things is one of the reasons why I love Python and Mathematica+Pythonika. With both its possible to put together, in a few minutes the desired functionality.

So, the idea is to spot the typical high entropy that should be exhibited by something like a RSA key stored in binary form. Assuming that its stored within data with significantly lower entropy, such as a standard executable file (that is, not packed or compressed itself), it should be easy to spot visually. Lets check...

First we need a function that calculates the entropy of a given chunk of data. The following code will take a Python string and calculate its byte entropy, returning a real number in the range 0.0 and 8.0.
Values close to 8.0 would indicate a high entropy, hence the likelihood of compressed or otherwise highly random data. Low values would indicate low complexity data such as text or executable instructions or any other data exhibiting clear patterns.

import math

def H(data):
  if not data:
    return 0
  entropy = 0
  for x in range(256):
    p_x = float(data.count(chr(x)))/len(data)
    if p_x > 0:
      entropy += - p_x*math.log(p_x, 2)
  return entropy

Next we want to be able to take a chunk of data and run the entropy calculation function all across it, on byte increments, with a defined block size. Starting from the byte at offset 0, we will calculate the entropy of each data chunk of the given size and return its value. The function is an iterator so that we can easily get a list of entropies for all offsets that we can next feed into a plotting function.

def entropy_scan (data, block_size) :
  for block in (
    data[x:block_size+x]
    for x in range (len (data) - block_size) ):

    yield H (block)

Now we need some test data, the following code will generate a low-entropy chunk of data 1024 bytes long, followed by a high-entropy one (assuming the random generator is good enough, which is the case for the example) also 1024 bytes long and closing with 1024 bytes more of low entropy data.

data = .join (
  [chr (random.randint (0, 64)) for x in xrange (1024)] +
  [chr (random.randint (0, 255)) for x in xrange (1024)] +
  [chr (random.randint (0, 64)) for x in xrange (1024)] )

If we run the Python code within Mathematica

ListPlot[ Py["<
list(
entropy_scan( data, 256 ) )
>"] ]

we obtain the following plot

displaying a noticeable bump in the region where the higher entropy data lies within our test data.

Update:

Deadhacker has posted an augmented version of my hack that does not rely on Mathematica in addition of being able to run on arbitrary files passed as arguments to his script.

	Posted: Wednesday, December 31 1969 18:00.00 CST