This database has been discontinued and no longer being supported but will be available upon request.
This database was formerly part of the NIST Special Databases collection, it was known as Special Database 20. The images contain a very rich set of graphic elements such as graphs, tables, equations, two column text, maps, pictures, footnotes, annotations, and arrays of such elements. No ground truthing or original typesetting information is available.
The images contain predominantly machine printed English, although three French and German documents are included.
Major features of the database include:
- 104 articles, books, journals
- 23,468 full-page binary images
- high resolution 15.75 dots per mm (400 dpi)
- 4 compact disks each containing about 500 Mb
- CCITT IV Compression Source Code
- a structural statistics file for each image
- page rotation estimates
- software utilities
Please click here to view the PDF version of Users' Guide.
The database is available as a four 5.25 inch CD-ROM set .
System requirements: CD-ROM drive with software to read ISO-9660 format.
The contact for this database is:
National Institute of Standards and Technology
100 Bureau Drive,
Gaithersburg, MD 20899-8940
Keywords: Automated character recognition; automated image recognition; full text databases; OCR; optical character recognition; software recognition.