Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Creating and Validating a Large Database for METTREC



W Klein, Michael D. Garris


The National Institute of Standards and Technology [NIST] is in the process of setting up a new series of conferences named the Metadata Text Retrieval Conferences [METTREC]. It will focus on evaluating document conversion using optical character recognition [OCR], and information retrieval [IR] technologies. Evaluations will be designed to investigate the impact of machine recognition errors upon information retrieval and to determine what interfaces are appropriate to integrate the two technologies. To implement this conference, we require databases that can be used for conference evaluations and has chosen the Federal Register to be the initial document source. It is a large, complete set of document source. It is a large, complete set of documents containing metadata that will allow quantitative evaluation of recognition and retrieval technologies. This paper describes the activities associated with scanning the Federal Register and validating the document images within the database. The process of image validation includes translating filenames, assuring image quality, and verifying correct page sequences. In order to reduce the cost of validation, we minimized human resource expenditure by exploiting OCR and high-speed visual adjudication from images by an operator. This process minimizes the expensive handling of paper to validate document image collections.
- 6090
Report Number


CD ROM, document, image database, information retrieval, METTREC, OCR, optical character recognition, quality, scanning


Klein, W. and Garris, M. (1997), Creating and Validating a Large Database for METTREC, - 6090, National Institute of Standards and Technology, Gaithersburg, MD, [online], (Accessed July 15, 2024)


If you have any questions about this publication or are having problems accessing it, please contact

Created December 1, 1997, Updated February 25, 2009