Alberto M. (mandingo) <mandingo yoire com> |
Thursday, November 29 2007 06:02.57 CST |
Description
Fredet means "Flexible Regular Expresion Data Extraction Tool".
The purpose of this tool is to facilitate the extraction of information contained in files using advanced regular expressions.
Fredet allows nest and combine regular expressions, define fields to facilitate the extraction information to others, generate new "checks" by simply editing a text file (config.xml), and so on.
Requirements
To work properly, the current version of "Fredet" needs:
* A Unix / Linux or Windows operating system
* with a Perl interpreter and "File:: Basename" installed
Help
fredet v1.2 - "Flexible Regular Expresion Data Extraction Tool"
(C) Copyleft 2007, created by Mandingo # http://www.yoire.com
Available checks (edit config.xml for checks management):
werrors Typical Web Errors
winfo Typical Web Info
asf action script dangerous functions
title find titles in HTML files
dotnet find .net error messages in HTML files
words extract words from files
emails find email addresses
ips find local IP addresses
paths find local paths
dbs find database error messages
links find links in files
dlinks find dynamic links in files
comments find comments in HTML file
Usage:
fredet.pl [options] [[<check>] [<format>] <file>]]
Options:
-f print filename before each line
-s read data from standard input (STDIN)
-m <regexp> match this <regexp> (<check> won't be used)
-b <block_size> block size used for reading files (def. 1048576 bytes)
Examples:
fredet.pl
fredet.pl emails
fredet.pl emails example.txt
fredet.pl emails name example.txt
fredet.pl emails '$0;$1' example.txt
fredet.pl words http://www.google.com
fredet.pl -m '\d+' example.txt
Usage examples
Example 1, getting more information about a "check":
./fredet.pl emails
Details:
description find email addresses
match[1].regexp (?-xism:(\w+?)@([^\.]+)\.\w+(?=[<>\'\"\s]))
match[1].display found email address: $0
field.email $0
field.name $1
field.domain $2
Usage:
fredet.pl emails <file>
fredet.pl emails [field1] [field2] [...] <file>
fredet.pl emails ['<format>'] <file>
Examples:
fredet.pl emails example.txt
fredet.pl emails name example.txt
fredet.pl emails '$0;$1' example.txt
Where:
* description: check name
* match[1].regexp: this check n�1 regular expresion
* match[1].display: format used (optional) to display this regexp results
* email: this name will be assigned to the first field of the regular expression
* name: name for the second field
* domain: name for the third field
Example 2a, extracting email addresses present in "example.txt" file:
./fredet.pl emails example.txt
found email address: [email protected]
./fredet.pl emails email example.txt
[email protected]
./fredet.pl emails '$0;$1;$2' example.txt
[email protected];j0hn;foo-ar
Where "example.txt" file has the following lines:
Try our wargames at <!--comment-->http://www.yoire.com, and enjoy it
invalid@email
//this is a comment
<!--172.18.1.2,[email protected] c:windows
...
The definition of this "check" is stored inside "config.xml":
<check name="emails" description="find email addresses">
<match display="found email address: $0">(\w+?)@([^\.]+)\.\w+(?=[<>\'\"\s])</match>
<field name="email" index="0"/>
<field name="name" index="1"/>
<field name="domain" index="2"/>
</check>
Example 2b, same as before using pipes:
cat example.txt|./fredet.pl -s emails
found email address: [email protected]
Example 2c, "fredet"+"find" to process multiple files at once:
find . -exec ./fredet.pl -f emails {} ;
./example.txt:found email address: [email protected]
Note: the "-f" option will shown the name of the opened file for each result.
Example 3, using regular expresions from command line:
./fredet.pl -m '\d+' example.txt
172
18
1
2
0
192
168
1
2
Example 4a, working with URLs insead of files:
./fredet.pl words http://www.google.com
HTML
HEAD
meta
http
equiv
content
...
It's possible to send POST data after the URL; it's recommended to enclose it between quotes.
Example 4b, URL+POST:
./fredet.pl words http://www.google.com 'var1=param1&var2=param2'
html
head
meta
http
equiv
content
...
Example 5, working with large files:
From version 1.2, it's possible to work with large binary files. The default block size is 1MB, but it's possible to change it using the new parameter -b
sudo ./fredet.pl -b 524288 ips /dev/mem
192.168.0.0
10.0.0.0
172.16.0.0
172.18.1.130
172.18.1.239
172.16.176.102
172.174.35.4
192.26.10.2
10.46.174.27
...
Config.xml
All the checks are configured inside this file. This is the basic format of a check:
<check name="check name" description="'check' description">
<match[ modifiers="modifiers"][display="format"]>regexp</match>
[more "match" definitions]
[<field name="field name" index="num1"/>]
[<field name="field name" index="num2"/>]
[more "field" definitions]
</check>
Next are real examples. This first example extracts the words (whose length >=3) of a file:
<check name="words" description="extract words from files">
<match modifiers="i" display="$1">([a-z]{3,}?)\w</match>
</check>
This example may help to extract the dynamic links inside a downloaded HTML page:
<check name="dlinks" description="find dynamic links in files">
<match display="$1 $2">(\w+)=[\"\']?(https*://.+\?.+?=.+?(?=[,\s\"\'<>]))</match>
<match display="txt $0">(?!=[\"\'])(https*://.+\?.+?=.+?(?=[,\s<>\"\']))</match>
</check>
Where:
* "display" is an optional parameter that allows us to specify the output format for this regexp
* "modifiers" may be added to the regular expression; for example, "i" makes the regexp "case-insensitive."
Download Fredet v1.2
|