Skip to main content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

NSRL Unique File Corpus

 The NSRL keeps one copy of each unique (as defined by a SHA-1 hash string) file encountered in processing.

Each file is assigned an integer identification number as it is encountered, and the file is stored in a directory and filename structure based on that integer.

The numbering starts at one (1). A directory is created for every 1,000,000 files, and in each of those directories, a directory is created for each 1,000 files. The filename is a nine character left-padded string (e.g. $filename = sprintf("%09d", $fileID) ).

Thus file number 1 is stored in the directory/filename "000/000/000000001". File number 12,345,678 is stored in "012/345/012345678".

A tab-delimited file is available which contains the corpus location, SHA-1, byte count and full original path of each file in the corpus. 650MB Zip file and for the file hash signature. Check here for an example view of this data.

 

Contact

Please send questions or comments to  nsrl [at] nist.gov

Created May 16, 2016, Updated July 19, 2017