Skip to main content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

NIST Machine-Print Database of Gray Scale and Binary Images (MPDB)


This database has been discontinued and is no longer available.
 

The NIST machine-printed database which was formerly part of the Special Databases collection contains gray scale and binary images of machine printed pages. The database was previously known as Special Database 8.

There was a total of 3,063,168 characters in the set which is an average of 8509 characters per page.

A reference file was included for each page. These reference files are the ASCII text pages that were used to generate the original hardcopy that was digitized.

This database was being distributed for use in the development and testing of Optical Character Recognition (OCR) systems on a common set of images. This allowed vendors to report results with respect to this common image set.

The database had the following features:

  • 3 font styles: Bold, Italics, and Normal
  • 6 font types: Courier, Helvetica, New Century Schoolbook, Optima, Palatino, and Times Roman
  • 10 point sizes; 4, 5, 6, 7, 8, 10, 11, 12, 15, 17, and 20
  • randomly generated order and sequential ordered pages
  • 360 unique pages each having a gray scale and binary representation
  • 12 pixels/mm resolution
  • 360 text files containing page reference answers
  • image format documentation and example software


Suitable for automated machine-print research, development, and evaluation, the data set can be used for:

  • algorithm development
  • system training and testing
  • character segmentation: separating full page image into characters
  • character recognition: identifying specific machine-printed characters


The database was a valuable tool for measurement and comparison of system performance on machine-print pages.

The contact for this database is:

Patricia Flanagan
National Institute of Standards and Technology
100 Bureau Drive, Stop 8940
Gaithersburg, MD 20899-8940

flanagan@nist.gov     

 

Keywords: ASCII Reference, automated character recognition, automated data capture, binary, character recognition, font size, full page, Grayscale Image Database, machine print, NIST, OCR, optical character recognition, software recognition, style.
 



 

Contact

Standard Reference Data, NIST:
100 Bureau Drive, Stop 6410
Gaithersburg, MD 20899-6410
(844) 374-0183 (Toll Free)

If you have any questions regarding this website, or notice any problems or inaccurate information, please contact the webmaster by sending e-mail to: data@nist.gov

Created August 27, 2010, Updated July 23, 2018