This database has been discontinued and is no longer available.
The NIST machine-printed database which was formerly part of the Special Databases collection contains gray scale and binary images of machine printed pages. The database was previously known as Special Database 8.
There was a total of 3,063,168 characters in the set which is an average of 8509 characters per page.
A reference file was included for each page. These reference files are the ASCII text pages that were used to generate the original hardcopy that was digitized.
This database was being distributed for use in the development and testing of Optical Character Recognition (OCR) systems on a common set of images. This allowed vendors to report results with respect to this common image set.
The database had the following features:
- 3 font styles: Bold, Italics, and Normal
- 6 font types: Courier, Helvetica, New Century Schoolbook, Optima, Palatino, and Times Roman
- 10 point sizes; 4, 5, 6, 7, 8, 10, 11, 12, 15, 17, and 20
- randomly generated order and sequential ordered pages
- 360 unique pages each having a gray scale and binary representation
- 12 pixels/mm resolution
- 360 text files containing page reference answers
- image format documentation and example software
Suitable for automated machine-print research, development, and evaluation, the data set can be used for:
- algorithm development
- system training and testing
- character segmentation: separating full page image into characters
- character recognition: identifying specific machine-printed characters
The database was a valuable tool for measurement and comparison of system performance on machine-print pages.
The contact for this database is:
National Institute of Standards and Technology
100 Bureau Drive, Stop 8940
Gaithersburg, MD 20899-8940
Keywords: ASCII Reference, automated character recognition, automated data capture, binary, character recognition, font size, full page, Grayscale Image Database, machine print, NIST, OCR, optical character recognition, software recognition, style.