NOTICE: Due to a lapse in annual appropriations, most of this website is not being updated. Learn more.
Form submissions will still be accepted but will not receive responses at this time. Sections of this site for programs using non-appropriated funds (such as NVLAP) or those that are excepted from the shutdown (such as CHIPS and NVD) will continue to be updated.
An official website of the United States government
Here’s how you know
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
Secure .gov websites use HTTPS
A lock (
) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.
Assessing the Effect of Inconsistent Assessors on Summarization Evaluation
Published
Author(s)
Karolina K. Owczarzak, Peter Rankel, Hoa T. Dang, John M. Conroy
Abstract
We investigate the consistency of human assessors involved in summarization evaluation to understand its effect on system ranking and automatic evaluation techniques. Using Text Analysis Conference data, we measure annotator consistency based on human scoring of summaries for Responsiveness, Readability, and Pyramid scoring. We identify inconsistencies in the data and measure to what extent these inconsistencies affect the ranking of automatic summarization systems. Finally, we examine the stability of automatic scoring metrics (ROUGE and CLASSY) with respect to the inconsistent assessments.
Proceedings Title
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics
Owczarzak, K.
, Rankel, P.
, Dang, H.
and Conroy, J.
(2012),
Assessing the Effect of Inconsistent Assessors on Summarization Evaluation, Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, KR, [online], https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=911315
(Accessed October 8, 2025)