Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Information Technology Laboratory / AI Research, Measurement, and Standards Division

AI Standards and Guidelines Group

NSRL bulk_extractor 1.4.4 Data

NSRL has run bulk_extractor against our unique file corpus.

These data sets were created using bulk_extractor version 1.4.4 on Mac OSX.

The MD5, SHA1 and SHA256 file signatures for the zip files are available here

The corpus files are named using 9-digit strings. The NIST architecture has some limits when using bulk_extractor, and was only able to process directories containing at most 500,000 files. These zip files contain 2 runs over 500,000 file sets, for a total of 1,000,000 files processed per zip file.

Files "000nnnnnn" are contained in "run_be_0m.zip"
files "001nnnnnn" are contained in "run_be_1m.zip"
etc.

run_be_0m.zip
run_be_1m.zip
run_be_2m.zip
run_be_3m.zip
run_be_4m.zip
run_be_5m.zip
run_be_6m.zip
run_be_7m.zip
run_be_8m.zip
run_be_9m.zip
run_be_10m.zip
run_be_11m.zip
run_be_12m.zip
run_be_13m.zip
run_be_14m.zip
run_be_15m.zip
run_be_16m.zip
run_be_17m.zip
run_be_18m.zip
run_be_19m.zip
run_be_20m.zip
run_be_21m.zip
run_be_22m.zip
run_be_23m.zip
run_be_24m.zip
run_be_25m.zip