OpenRCE Blog Entry

Alberto M. (mandingo) <mandingo

yoire

com>

Tuesday, December 4 2007 05:50.48 CST

Description

Fredet means "Flexible Regular Expresion Data Extraction Tool".

The purpose of this tool is to facilitate the extraction of information contained in files using advanced regular expressions.

Fredet allows nest and combine regular expressions, define fields to facilitate the extraction information to others, generate new "checks" by simply editing a text file (config.xml), and so on.

Requirements

To work properly, the current version of "Fredet" needs:

* A Perl interpreter installed
* Curl binary for http requets

Help



fredet v1.3 - "Flexible Regular Expresion Data Extraction Tool"

(C) Copyleft 2007, created by Mandingo # http://www.yoire.com



Available checks (edit config.xml for checks management):



  comments             find comments in HTML file

  jpg                  extract jpg files to disk

  emails               find email addresses

  werrors              Typical Web Errors

  winfo                Typical Web Info

  asf                  action script dangerous functions

  title                find titles in HTML files

  dotnet               find .net error messages in HTML files

  words                extract words from files

  ips                  find local IP addresses

  paths                find local paths

  dbs                  find database error messages

  links                find links in files

  dlinks               find dynamic links in files

  comments             find comments in HTML files



Usage:



   fredet [[options] [<check>] [<format>] [<file|dir|url>] [<postData>]]



Options:



   -f                  print filename before each line

   -s                  read data from standard input (STDIN)

   -O                  display matching offsets

   -w                  sliding window mode (experimental)

   -x                  XML output mode

   -m <regexp>         match this <regexp> (<check> won't be used)

   -c <config_file>    config file (def. config.xml)

   -b <block_size>     block size used for reading files (def. 5242880 bytes)

   -o <out_file_fmt>   dump results to disk using <out_file_fmt> format string



Examples:



   fredet

   fredet emails

   fredet emails example.txt

   fredet emails name example.txt

   fredet emails '$0;$1' example.txt

   fredet words http://www.google.com

   fredet -m '\d+' example.txt

   fredet -o 'file_$count.jpg' jpg image.bin

Usage examples

Example 1, getting more information about a "check":



./fredet.pl emails



Details:



  description         find email addresses

  match[1].regexp     (?-xism:(\w+?)@([^\.]+)\.\w+(?=[<>\'\"\s]))

  match[1].display    found email address: $0

  field.email         $0

  field.name          $1

  field.domain        $2



Usage:



   fredet.pl emails <file>

   fredet.pl emails [field1] [field2] [...] <file>

   fredet.pl emails ['<format>'] <file>



Examples:



   fredet.pl emails example.txt

   fredet.pl emails name example.txt

   fredet.pl emails '$0;$1' example.txt



Where:



    * description: check name

    * match[1].regexp: this check n�1 regular expresion

    * match[1].display: format used (optional) to display this regexp results

    * email: this name will be assigned to the first field of the regular expression

    * name: name for the second field

    * domain: name for the third field

Example 2a, extracting email addresses present in "example.txt" file:



./fredet.pl emails example.txt

found email address: [email protected]



./fredet.pl emails email example.txt

[email protected]



./fredet.pl emails '$0;$1;$2' example.txt

[email protected];j0hn;foo-ar

Where "example.txt" file has the following lines:



Try our wargames at <!--comment-->http://www.yoire.com, and enjoy it  

invalid@email

//this is a comment

<!--172.18.1.2,[email protected] c:windows

...



The definition of this "check" is stored inside "config.xml":



    <check name="emails" description="find email addresses">

                <match display="found email address: $0">(\w+?)@([^\.]+)\.\w+(?=[<>\'\"\s])</match>

                <field name="email" index="0"/>

                <field name="name" index="1"/>

                <field name="domain" index="2"/>

    </check>

Example 2b, same as before using pipes:



cat example.txt|./fredet.pl -s emails

found email address: [email protected]

Example 2c, "fredet"+"find" to process multiple files at once:



find . -exec ./fredet.pl -f emails \{\} \;

./example.txt:found email address: [email protected]

Note: the "-f" option will shown the name of the opened file for each result.

Example 3, using regular expresions from command line:



./fredet.pl -m '\d+' example.txt

172

18

1

2

0

192

168

1

2

Example 4a, working with URLs insead of files:



./fredet.pl words http://www.google.com

HTML

HEAD

meta

http

equiv

content

...

It's possible to send POST data after the URL; it's recommended to enclose it between quotes.

Example 4b, URL+POST:



./fredet.pl words http://www.google.com 'var1=param1&var2=param2'

html

head

meta

http

equiv

content

...

Example 5, working with large files:

From version 1.2, it's possible to work with large binary files. The default block size is 1MB, but it's possible to change it using the new parameter -b



sudo ./fredet.pl -b 524288 ips /dev/mem

192.168.0.0

10.0.0.0

172.16.0.0

172.18.1.130

172.18.1.239

172.16.176.102

172.174.35.4

192.26.10.2

10.46.174.27

...

Example 6, extracting all "jpg" files from a binary image:



./fredet.pl -o 'image_$count.jpg' jpg 8-jpeg-search.dd

image_0001.jpg

image_0002.jpg

image_0003.jpg

image_0004.jpg

image_0005.jpg

image_0006.jpg

image_0007.jpg

Config.xml

All the checks are configured inside this file. This is the basic format of a check:



<check name="check name" description="'check' description">

    <match[ modifiers="modifiers"][display="format"][output="output_file"]>regexp</match>

    [more "match" definitions]

    [<field name="field name" index="num1"/>]

    [<field name="field name" index="num2"/>]

    [more "field" definitions]

</check>

Next are real examples. This first example extracts the words (whose length >=3) of a file:



<check name="words" description="extract words from files">

    <match modifiers="i" display="$1">([a-z]{3,}?)\</match>

</check>

This example may help to extract the dynamic links inside a downloaded HTML page:



<check name="dlinks" description="find dynamic links in files">

    <match display="$1 $2">(\w+)=[\"\']?(https*://.+\?.+?=.+?(?=[,\s\"\'<>]))</match>

    <match display="txt $0">(?!=[\"\'])(https*://.+\?.+?=.+?(?=[,\s<>\"\']))</match>

</check>

Where:

* "display" is an optional parameter that allows us to specify the output format for this regexp
* "modifiers" may be added to the regular expression; for example, "i" makes the regexp "case-insensitive."

Download

Download Fredet v1.3

c1de0x	Posted: Wednesday, December 5 2007 01:13.35 CST
Dude.... Please publish changelogs... It's difficult to tell what changes from version to version!