Fredet v1.1
Alberto M. (mandingo) <mandingoyoirecom> Tuesday, November 27 2007 05:52.17 CST


Fredet means "Flexible Regular Expresion Data Extraction Tool".

The purpose of this tool is to facilitate the extraction of information contained in files using
advanced regular expressions.

Fredet allows nest and combine regular expressions, define fields to facilitate the extraction information
to others, generate new "checks" by simply editing a text file (config.xml), and so on.

Requirements

To work properly, the current version of "Fredet" needs:

    * A Unix / Linux or Windows operating system
    * with a Perl interpreter and "File:: Basename" installed

Help

fredet v1.1 - "Flexible Regular Expresion Data Extraction Tool"
(C) Copyleft 2007, created by Mandingo # http://www.yoire.com

Available checks (edit config.xml for checks management):

  asf                  action script dangerous functions
  words                extract words from files
  emails               find email addresses
  ips                  find local IP addresses
  paths                find local paths
  dbs                  find database error messages
  links                find links in files
  dlinks               find dynamic links in files
  comments             find comments in HTML files

Usage:

   fredet.pl [options] [[<check>] [<format>] <file>]]

Options:

   -f                  print filename before each line
   -s                  read data from standard input (STDIN)
   -m <regexp>         match this <regexp> (<check> won't be used)

Examples:

   fredet.pl
   fredet.pl emails
   fredet.pl emails example.txt
   fredet.pl emails name example.txt
   fredet.pl emails '$0;$1' example.txt


Usage examples

* Example 1, getting more information about a "check":

./fredet.pl emails

Details:

  description         find email addresses
  match[1].regexp     (?-xism:(\w+?)@([^\.]+).\w+(?=[<>\'\"\s]))
  match[1].display    found email address: $0
  field.email         $0
  field.name          $1
  field.domain        $2

Usage:

   fredet.pl emails <file>
   fredet.pl emails [field1] [field2] [...] <file>
   fredet.pl emails ['<format>'] <file>

Examples:

   fredet.pl emails example.txt
   fredet.pl emails name example.txt
   fredet.pl emails '$0;$1' example.txt

Where:

    * description: check name
    * match[1].regexp: this check n�1 regular expresion
    * match[1].display: format used (optional) to display this regexp results
    * email: this name will be assigned to the first field of the regular expression
    * name: name for the second field
    * domain: name for the third field


* Example 2a, extracting email addresses present in "example.txt" file:

./fredet.pl emails example.txt
found email address: [email protected]

./fredet.pl emails email example.txt
[email protected]

./fredet.pl emails '$0;$1;$2' example.txt
[email protected];j0hn;foo-ar

Where "example.txt" file has the following lines:

Try our wargames at <!--comment-->http://www.yoire.com, and enjoy it  
invalid@email
//this is a comment
<!--172.18.1.2,[email protected] c:windows
...

The definition of this "check" is stored inside "config.xml":

    <check name="emails" description="find email addresses">
                <match display="found email address: $0">(\w+?)@([^\.]+)\.w+(?=[<>\'\"\s])</match>
                <field name="email" index="0"/>
                <field name="name" index="1"/>
                <field name="domain" index="2"/>
    </check>


* Example 2b, same as before using pipes:

cat example.txt|./fredet.pl -s emails
found email address: [email protected]


* Example 2c, "fredet"+"find" to process multiple files at once:

find . -exec ./fredet.pl -f emails {} ;
./example.txt:found email address: [email protected]

Note: the "-f" option will shown the name of the opened file for each result.

* Example 3, using regular expresions from command line:

./fredet.pl -m 'd+' example.txt
172
18
1
2
0
192
168
1
2


Config.xml

All the checks are configured inside this file. This is the basic format of a check:

<check name="check name" description="'check' description">
    <match[ modifiers="modifiers"][display="format"]>regexp</match>
    [more "match" definitions]
    [<field name="field name" index="num1"/>]
    [<field name="field name" index="num2"/>]
    [more "field" definitions]
</check>


Next are real examples. This first example extracts the words (whose length >=3) of a file:

<check name="words" description="extract words from files">
    <match modifiers="i" display="$1">([a-z]{3,}?)\w</match>
</check>

This example may help to extract the dynamic links inside a downloaded HTML page:

<check name="dlinks" description="find dynamic links in files">
    <match display="$1 $2">(\w+)=[\"\']?(https*://.+\?.+?=.+?(?=[,\s\"\'<>]))</match>
    <match display="txt $0">(?!=[\"\'])(https*://.+\?.+?=.+?(?=[,\s<>\"\']))</match>
</check>

Where:

    * "display" is an optional parameter that allows us to specify the output format for this regexp
    * "modifiers" may be added to the regular expression; for example, "i" makes the regexp "case-insensitive."

Download

Download Fredet v1.1




Comments
trufae Posted: Wednesday, November 28 2007 14:07.06 CST
nice tool :)

Being based on perl I can understand that it works with binary data too.

Would be good to allow start/end marks and binary structure checks with max length. This way, fredet can be used as a carver like scalpel does.

mandingo Posted: Thursday, November 29 2007 05:58.39 CST
I've added some changes to better working with binaries.. check the new version :)