An official website of the United States government
Here’s how you know
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
Secure .gov websites use HTTPS
A lock (
) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.
Two tracks within TREC have examined the problem of retrieving noisy documents---documents whose content is not necessarily a faithful representation of the author's intent. The confusion track tested the ability of system to retrieve documents that were the output of an optical character recognition process. The spoken document retrieval track explored the feasibility of providing content-based access to recordings of speech by retrieving the output of an automatic speech recognizer. Both tracks found that the noise introduced by these processes can be compensated for such that the effectiveness of retrieving the noisy text is comparable to that of clean text for a broad range of error rates.
Citation
Retrieving Noisy Text
Publisher Info
TREC Chapter to be published in: TREC: Experiment and Evaluation in Information Retrieval, 2005,
Pub Type
Books
Keywords
information retrieval, OCR, speech recognition, TREC
Citation
Voorhees, E.
and Garofolo, J.
(2004),
Retrieving Noisy Text, TREC Chapter to be published in: TREC: Experiment and Evaluation in Information Retrieval, 2005,
(Accessed November 30, 2023)