An Assessment of the Accuracy of Automatic Evaluation in Summarization

Karolina K. Owczarzak; John M. Conroy; Hoa T. Dang; Ani Nenkova

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

PUBLICATIONS

An Assessment of the Accuracy of Automatic Evaluation in Summarization

Published

June 8, 2012

Author(s)

Karolina K. Owczarzak, John M. Conroy, Hoa T. Dang, Ani Nenkova

Abstract

Automatic evaluation has greatly facilitated system development in summarization. At the same time, the use of automatic evaluation has been viewed with mistrust by many, as its accuracy and correct application are not well understood. In this paper we provide an assessment of the automatic evaluations used for multi-document summarization of news. We outline our recommendations about how any evaluation, manual or automatic, should be used to find statistically significant differences between summarization systems. We identify the reference automatic evaluation metrics— ROUGE 1 and 2—that appear to best emulate human pyramid and responsiveness scores on three years of NIST evaluations. We then demonstrate the accuracy of these metrics in reproducing human judgements about the relative content quality of pairs of systems and present an empirical assessment of the relationship between statistical significance between systems and the relative size improvement in terms of automatic evaluations. Finally, we present a case study of how new metrics should be compared to the reference evaluation, as we search for even more accurate automatic measures.

Proceedings Title

Proceedings of the Workshop on Evaluation Metrics and System Comparison for Automatic Summarization

Conference Dates

June 8, 2012

Conference Location

Montreal, CA

Conference Title

Workshop on Evaluation Metrics and System Comparison for Automatic Summarization

Pub Type

Conferences

Download Paper

Local Download

Keywords

Evaluation, Summarization

Information technology

Citation

Owczarzak, K. , Conroy, J. , Dang, H. and Nenkova, A. (2012), An Assessment of the Accuracy of Automatic Evaluation in Summarization, Proceedings of the Workshop on Evaluation Metrics and System Comparison for Automatic Summarization, Montreal, CA, [online], https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=911393 (Accessed February 26, 2026)

Issues

If you have any questions about this publication or are having problems accessing it, please contact [email protected].

Created June 7, 2012, Updated October 12, 2021

Was this page helpful?

An Assessment of the Accuracy of Automatic Evaluation in Summarization

Author(s)

Abstract

Download Paper

Keywords

Citation

Additional citation formats

Issues