Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Retrieving Noisy Text

Published

Author(s)

Ellen M. Voorhees, John S. Garofolo

Abstract

Two tracks within TREC have examined the problem of retrieving noisy documents---documents whose content is not necessarily a faithful representation of the author's intent. The confusion track tested the ability of system to retrieve documents that were the output of an optical character recognition process. The spoken document retrieval track explored the feasibility of providing content-based access to recordings of speech by retrieving the output of an automatic speech recognizer. Both tracks found that the noise introduced by these processes can be compensated for such that the effectiveness of retrieving the noisy text is comparable to that of clean text for a broad range of error rates.
Citation
Retrieving Noisy Text
Publisher Info
TREC Chapter to be published in: TREC: Experiment and Evaluation in Information Retrieval, 2005,

Keywords

information retrieval, OCR, speech recognition, TREC

Citation

Voorhees, E. and Garofolo, J. (2004), Retrieving Noisy Text, TREC Chapter to be published in: TREC: Experiment and Evaluation in Information Retrieval, 2005, (Accessed May 23, 2024)

Issues

If you have any questions about this publication or are having problems accessing it, please contact reflib@nist.gov.

Created September 26, 2004, Updated February 17, 2017