An official website of the United States government
Here’s how you know
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
Secure .gov websites use HTTPS
A lock (
) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.
Different Structures for Evaluating Answers to Complex Questions: Pyramids Are Stable, and So Are Human Assessors
Published
Author(s)
Hoa T. Dang, Jimmy Lin
Abstract
The idea of ``nugget pyramids'' has recently been introduced as a refinement to the nugget-based methodology employed to evaluate answers to complex questions in the Text Retrieval Conference (TREC) Question Answering (QA) tracks. This work examines data from the TREC 2006 QA track, the first large-scale deployment of the nugget pyramids method, and shows that this method of combining judgments of nugget importance from multiple assessors increases the stability and discriminative power of the evaluation while introducing only a small additional manual assessment cost. We address the desire to maintain a model of real users for the task of question answering, by exploring different ways in which assessor opinions can be combined. We show that the nugget pyramid evaluation is highly correlated with other evaluations that do maintain a user model, and hence is an appropriate method for evaluating an end-user task such as question-answering.
Proceedings Title
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics
Conference Location
Prague, Czech Republic,
Conference Title
45th Annual Meeting of the Association for Computational Linguistics
Dang, H.
and Lin, J.
(2007),
Different Structures for Evaluating Answers to Complex Questions: Pyramids Are Stable, and So Are Human Assessors, Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, Prague, Czech Republic, , [online], https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=51158
(Accessed October 15, 2024)