Scrape files for sensitive information, and generate an interactive HTML report. Based on Rabin2.
This tool is only as good as your RegEx skills.
You can also style your own report.
Tested on Kali Linux v2024.2 (64-bit).
Made for educational purposes. I hope it will help!
On Kali Linux, run:
apt-get -y install radare2
On Windows OS, download and unpack radareorg/radare2, then, add the bin
directory to Windows PATH
environment variable.
On macOS, run:
brew install radare2
pip3 install --upgrade file-scraper
git clone && cd file-scraper
python3 -m pip install --upgrade build
python3 -m build
python3 -m pip install dist/file_scraper-4.5-py3-none-any.whl
Prepare a template such as the default template:
"query":"(?:basic|bearer)\\ ",
"query":"(?:access|account|admin|auth|card|conf|cookie|cred|customer|email|history|ident|info|jwt|key|kyc|log|otp|pass|pin|priv|refresh|salt|secret|seed|session|setting|sign|token|transaction|transfer|user)[\\w\\d\\-\\_]*(?:\\\"\\ *\\:|\\ *\\=[^\\=]{1})",
"query":"(?:(?<!\\:)\\/\\/|\\#).*(?:bug|compatibility|crash|deprecated|fix|issue|legacy|problem|review|security|todo|to do|to-do|to_do|vuln|warning)",
"Abs. URL":{
"query":"-----BEGIN (?:CERTIFICATE|PRIVATE KEY)-----[\\s\\S]+?-----END (?:CERTIFICATE|PRIVATE KEY)-----",
Make sure your regular expressions return only one capturing group, e.g., [1, 2, 3, 4]
; and not a touple, e.g., [(1, 2), (3, 4)]
Make sure to properly escape regular expression specific symbols in your template file, e.g., make sure to escape dot .
as \\.
, and forward slash /
as \\/
, etc.
Name | Type | Required | Description |
query | str | yes | Regular expression query. |
search | bool | no | Highlight matches within the searched lines; otherwise, extract the matches. |
ignorecase | bool | no | Case-insensitive search. |
minimum | int | no | Only accept matches longer than int characters. |
maximum | int | no | Only accept matches lesser than int characters. |
decode | str | no | Decode the matches. Available decodings: url , base64 hex , pem . |
minimum_decode | int | no | Only accept decodings longer than int characters. |
maximum_decode | int | no | Only accept decodings lesser than int characters. |
unique | bool | no | Filter out duplicates. |
collect | bool | no | Collect all the matches in one place. |
and maximum_decode
will check the length of the decoded string after bad characters are removed.
How I typically run the tool:
file-scraper -dir directory -o results.html -e default
Default (built-in) exclude file types:
car, css, gif, jpeg, jpg, mp3, mp4, nib, ogg, otf, eot, png, storyboard, strings, svg, ttf, webp, woff, woff2, xib, vtt
File Scraper v4.5 ( )
Usage: file-scraper -dir directory -o out [-t template ] [-th threads]
Example: file-scraper -dir decoded -o results.html [-t template.json] [-th 10 ]
Scrape files for sensitive information
Directory containing files or a single file to scrape
-dir, --directory> = decoded | files | test.exe | etc.
File containing extraction details or a single RegEx to use
Default: built-in JSON template file
-t, --template = template.json | "secret\: [\w\d]+" | etc.
Exclude all files ending with the specified extension
Specify 'default' to load the built-in list
Use comma-separated values
-e, --excludes = mp3 | default,jpg,png | etc.
Include all files ending with the specified extension
Overrides the excludes
Use comma-separated values
-i, --includes = java | json,xml,yaml | etc.
Beautify [minified] JavaScript (.js) files
-b, --beautify
Number of parallel threads to run
Default: 30
-th, --threads = 10 | etc.
Output file
-o, --out = results.html | etc.
Enable debug output
-dbg, --debug
Figure 1 - Interactive Report (1)
Figure 2 - Interactive Report (2)
Figure 3 - Interactive Report (3)