A Decade of Automatic Content Evaluation of News Summaries: Reassessing the State of the Art

Peter Rankel; John M. Conroy; Hoa T. Dang; Ani Nenkova

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

PUBLICATIONS

A Decade of Automatic Content Evaluation of News Summaries: Reassessing the State of the Art

Published

August 5, 2013

Author(s)

Peter Rankel, John M. Conroy, Hoa T. Dang, Ani Nenkova

Abstract

How good are automatic content metrics for news summary evaluation? Here we provide a detailed answer to this question, with a particular focus on assessing the ability of automatic evaluations to identify statistically significant differences present in manual evaluation of content. Using four years of TAC data, we analyze the performance of eight ROUGE variants in terms of accuracy, precision and recall in finding significantly different systems. Our experiments show that some of the neglected variants of ROUGE, based on higher order n-gram syntactic dependencies are most accurate across the years; the commonly used R-1 scores find too many significant differences. We also test combinations of ROUGE variants and find that they considerably improve the accuracy of automatic prediction.

Proceedings Title

Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics

Conference Dates

August 4-9, 2013

Conference Location

Sofia, BG

Conference Title

51st Annual Meeting of the Association for Computational Linguistics

Pub Type

Conferences

Download Paper

Local Download

Keywords

evaluation, summarization

Data and informatics

Citation

Rankel, P. , Conroy, J. , Dang, H. and Nenkova, A. (2013), A Decade of Automatic Content Evaluation of News Summaries: Reassessing the State of the Art, Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, BG, [online], https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=914007 (Accessed April 17, 2024)

Created August 4, 2013, Updated October 12, 2021

A Decade of Automatic Content Evaluation of News Summaries: Reassessing the State of the Art

Author(s)

Abstract

Download Paper

Keywords

Citation

Additional citation formats