The TREC-8 Question Answering (QA) Track was the first large-scale evaluation of domain-independent question answering systems. In addition to fostering research on the QA task, the track was used to investigate whether the evaluation methodology used for document retrieval is appropriate for a different natural language processing task. As with document relevance judging, assessors had legitimate differences of opinions as to whether a response actually answers a question, but comparative evaluation of QA systems was stable despite these differences. Creating a reusable QA test collection is fundamentally more difficult than creating a document retrieval test collection since the QA task has no equivalent to document identifiers.
ACM Special Interest Group in Information Retrieval (SIGIR)
evaluation, human assessors, natural language processing, question answering, task-based training, think-aloud observations, TREC
and Tice, D.
Building a Question Answering Test Collection, ACM Special Interest Group in Information Retrieval (SIGIR)
(Accessed December 10, 2023)