Skip to main content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

NIST Special Database 19

NIST Handprinted Forms and Characters Database

Special Database 19 contains NIST's entire corpus of training materials for handprinted document and character recognition. It publishes Handprinted Sample Forms from 3600 writers, 810,000 character images isolated from their forms, ground truth classifications for those images, reference forms for further data collection, and software utilities for image management and handling.

The features of this database are:

  • Final accumulation of NIST's handprinted sample data
  • Full page HSF forms from 3600 writers
  • Separate digit, upper and lower case, and free text fields
  • Over 800,000 images with hand checked classifications


    The database is NIST's largest and probably final release of images intended for handprint document processing and OCR research. The full page images are the default input to the NIST FORM-BASED HANDPRINT RECOGNITION SYSTEM, a public domain release of end-to-end recognition software.

2nd Edition – September 2016

Download – by_class.zipMD5 hash file
Download – by_field.zipMD5 hash file
Download – by_merge.zip MD5 hash file
Download – by_write.zipMD5 hash file
Download – hsf_page.zipMD5 hash file

Please click here to view the PDF version of Users' Guide

Example HSF Image. This is the file hsf_page/hsf_0/f0002_01.pct.  Notice that the first field on this form, the name field, has been intentionally occluded, on some others it remains blank.  All fields except those on the first line havebeen segmented and
Example HSF Image.

1st Edition - March 1995

Download – 1st Edition1995.zipMD5 hash file

Please click here to view the PDF version of Users' Guide

The EMNIST dataset is a set of handwritten character digits derived from the NIST Special Database 19 and converted to a 28x28 pixel image format and dataset structure that directly matches the MNIST dataset.      

For more information please contact:
Standard Reference Data Program
National Institute of Standards and Technology
100 Bureau Dr., Stop 6410
Gaithersburg, MD 20899-6410
(844) 374-0183 (Toll Free) 

 

The scientific contact for this database is:
Patrick J. Grother
National Institute of Standards and Technology
100 Bureau Drive, Stop 8940
Building 225, Room A216
Gaithersburg, MD 20899-8940 (301) 975-4157 patrick.grother [at] nist.gov ( (link sends e-mail))

Keywords: Automated character recognition; automated data capture; character recognition; forms recognition; handwriting recognition; OCR; optical character recognition; software recognition.

DOI: http://doi.org/10.18434/T4H01C

Customer Support
 

Contact

Standard Reference Data, NIST:
100 Bureau Drive, Stop 6410
Gaithersburg, MD 20899-6410
(844) 374-0183 (Toll Free)

If you have any questions regarding this website, or notice any problems or inaccurate information, please contact the webmaster by sending e-mail to: data [at] nist.gov

Created August 27, 2010, Updated April 27, 2019