NSRL has run bulk_extractor against our unique file corpus.
These data sets were created using bulk_extractor version 1.4.4 on Mac OSX.
The MD5, SHA1 and SHA256 file signatures for the zip files are available here
The corpus files are named using 9-digit strings. The NIST architecture has some limits when using bulk_extractor, and was only able to process directories containing at most 500,000 files. These zip files contain 2 runs over 500,000 file sets, for a total of 1,000,000 files processed per zip file.
Files "000nnnnnn" are contained in "run_be_0m.zip"
files "001nnnnnn" are contained in "run_be_1m.zip"