Two tracks within TREC have examined the problem of retrieving noisy documents---documents whose content is not necessarily a faithful representation of the author's intent. The confusion track tested the ability of system to retrieve documents that were the output of an optical character recognition process. The spoken document retrieval track explored the feasibility of providing content-based access to recordings of speech by retrieving the output of an automatic speech recognizer. Both tracks found that the noise introduced by these processes can be compensated for such that the effectiveness of retrieving the noisy text is comparable to that of clean text for a broad range of error rates.
Retrieving Noisy Text
TREC Chapter to be published in: TREC: Experiment and Evaluation in Information Retrieval, 2005,
information retrieval, OCR, speech recognition, TREC
and Garofolo, J.
Retrieving Noisy Text, TREC Chapter to be published in: TREC: Experiment and Evaluation in Information Retrieval, 2005,
(Accessed November 30, 2023)