Skip to main content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

NIST Special Database 2

NIST Structured Forms Reference Set of Binary Images (SFRS)

Price: No charge

Click here to download MD5 File

The NIST Structured Forms Database consists of 5,590 pages of binary, black-and-white images of synthesized documents.

The documents in this database are 12 different tax forms from the IRS 1040 Package X for the year 1988. These include Forms 1040, 2106, 2441, 4562, and 6251 together with Schedules A, B, C, D, E, F, and SE.

Eight of these forms contain two pages or form faces; therefore, there are 20 different form faces represented in the database.

The document images in this database appear to be real forms prepared by individuals, but the images have been automatically derived and synthesized using a computer.

There are 900 simulated tax submissions represented in the database averaging 6.2 form faces per submission.

The database has the following features:

  • 900 simulated tax submissions
  • 5,590 images of completed structured form faces
  • 5,590 text files containing entry field answers
  • 20 tables of entry field types and contexts

Suitable for both document processing and automated data capture research, development, and evaluation, the data set can be used for:

  • forms identification
  • field isolation; locating the entry fields on the form
  • character segmentation: separating entry field values into characters
  • character recognition: identifying specific machine printed characters

This database is a valuable tool for measurement of system performance and system comparison on complex forms.

Please click here to view the PDF version of Users' Guide.

A representative image file of a completed form in NIST Special Database 2
A representative image file of a completed form in NIST Special Database 2

For more information on Special Database 2 please contact:
Standard Reference Data Program
National Institute of Standards and Technology
100 Bureau Dr., Stop 6410
Gaithersburg, MD 20899-6410
(844) 374-0183 (Toll Free) 

The scientific contact for this database is:
Michael Garris
National Institute of Standards and Technology
100 Bureau Drive, Stop 8940
Gaithersburg, MD 20899-8940
mgarris@nist.gov

Keywords: ASCII Reference, automated character recognition, automated data capture, forms identification, image, IRS, NIST, Machine Print, OCR, optical character recognition, printed characters, software recognition, synthesized documents, tax forms

 

Contact

Standard Reference Data, NIST:
100 Bureau Drive, Stop 6410
Gaithersburg, MD 20899-6410
Tel: 844-374-0183 (Toll Free)
 

If you have any questions regarding this website, or notice any problems or inaccurate information, please contact the webmaster by sending e-mail to: data@nist.gov

Created August 27, 2010, Updated April 27, 2019