Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

NSRL Unique File Corpus

 The NSRL keeps one copy of each unique (as defined by a SHA-1 hash string) file encountered in processing.

Each file is assigned an integer identification number as it is encountered, and the file is stored in a directory and filename structure based on that integer.

The numbering starts at one (1). A directory is created for every 1,000,000 files, and in each of those directories, a directory is created for each 1,000 files. The filename is a nine character left-padded string (e.g. $filename = sprintf("%09d", $fileID) ).

Thus file number 1 is stored in the directory/filename "000/000/000000001". File number 12,345,678 is stored in "012/345/012345678".

A tab-delimited file is available which contains the corpus location, SHA-1, byte count and full original path of each file in the corpus. 650MB Zip file and for the file hash signature. Check here for an example view of this data.

 

Contacts

Created May 16, 2016, Updated January 27, 2022