Author(s)
Ellen M. Voorhees, D M. Tice
Abstract
The TREC-8 Question Answering track was the first large-scale evaluation of systems that return answers, as opposed to lists of documents, in response to a question. As a first evaluation, it is important to examine the evaluation methodology itself to understand any limits on the conclusions that can be drawn from the evaluation and possibly to find ways to improve subsequent evaluations. This paper has two main goals: to describe in detail how the evaluation was implemented, and to examine the consequences of the methodology on the comparative performance of the systems participating in the evaluation. The examination uncovered no serious flaws in the methodology, supporting its continued use for question answering evaluation. Nonetheless, redefining the specific task to be performed so that it more closely matches an actual user task does appear warranted.
Citation
The TREC-8 Question Answering Track Evaluation
Keywords
evaluation, human assessors, natural language processing, question answering, task-based training, think-aloud observations, TREC
Citation
Voorhees, E.
and Tice, D.
(2000),
The TREC-8 Question Answering Track Evaluation, The TREC-8 Question Answering Track Evaluation, [online], https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=151446 (Accessed May 4, 2026)
Additional citation formats
Issues
If you have any questions about this publication or are having problems accessing it, please contact [email protected].