Take a sneak peek at the new NIST.gov and let us know what you think!
(Please note: some content may not be complete on the beta site.).
NIST Authors in Bold
|Author(s):||Peter Rankel; John M. Conroy; Hoa T. Dang; Ani Nenkova;|
|Title:||A Decade of Automatic Content Evaluation of News Summaries: Reassessing the State of the Art|
|Published:||August 05, 2013|
|Abstract:||How good are automatic content metrics for news summary evaluation? Here we provide a detailed answer to this question, with a particular focus on assessing the ability of automatic evaluations to identify statistically significant differences present in manual evaluation of content. Using four years of TAC data, we analyze the performance of eight ROUGE variants in terms of accuracy, precision and recall in finding significantly different systems. Our experiments show that some of the neglected variants of ROUGE, based on higher order n-gram syntactic dependencies are most accurate across the years; the commonly used R-1 scores find too many significant differences. We also test combinations of ROUGE variants and find that they considerably improve the accuracy of automatic prediction.|
|Conference:||51st Annual Meeting of the Association for Computational Linguistics|
|Proceedings:||Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics|
|Pages:||pp. 131 - 136|
|Dates:||August 4-9, 2013|
|Research Areas:||Data and Informatics|
|PDF version:||Click here to retrieve PDF version of paper (136KB)|