NSRL has run bulk_extractor against our unique file corpus.
These data sets were created using bulk_extractor version 1.4.4 on Mac OSX.
The MD5, SHA1 and SHA256 file signatures for the zip files are available here
The corpus files are named using 9-digit strings. The NIST architecture has some limits when using bulk_extractor, and was only able to process directories containing at most 500,000 files. These zip files contain 2 runs over 500,000 file sets, for a total of 1,000,000 files processed per zip file.
Files "000nnnnnn" are contained in "run_be_0m.zip"
files "001nnnnnn" are contained in "run_be_1m.zip"
etc.
run_be_0m.zip
run_be_1m.zip
run_be_2m.zip
run_be_3m.zip
run_be_4m.zip
run_be_5m.zip
run_be_6m.zip
run_be_7m.zip
run_be_8m.zip
run_be_9m.zip
run_be_10m.zip
run_be_11m.zip
run_be_12m.zip
run_be_13m.zip
run_be_14m.zip
run_be_15m.zip
run_be_16m.zip
run_be_17m.zip
run_be_18m.zip
run_be_19m.zip
run_be_20m.zip
run_be_21m.zip
run_be_22m.zip
run_be_23m.zip
run_be_24m.zip
run_be_25m.zip