CALL FOR PAPERS NAACL-HLT 2012 Workshop on Evaluation Metrics and System Comparison for Automatic Summarization June 8, 2012 Montreal, Quebec, Canada http://www.nist.gov/tac/2012/WEAS/ WORKSHOP DESCRIPTION Interest in summarization research has been steadily growing in the past decade, with numerous new methods being proposed for generic and topic-focused summarization of news. Other genres and domains, most notably related to spoken input, have also become well established, including summarization of broadcast news, meetings, spoken conversations and lectures. At the same time, development of evaluation metrics for summarization and of resources for some genres and domains has lagged behind. Manual evaluation protocols (Pyramid scores for content selection, scores for linguistic quality and overall responsiveness) show considerable disparity between human performance and the performance of systems for multi-document summarization of news; however, the widely used suite for automatic evaluation of content, ROUGE, shows much narrower difference between machine and human performance and even fails to distinguish the two. For speech summarization ROUGE also does not properly reflect the difference between human and automatic summarizers and, unlike for written news, has low correlations with manual evaluation protocols. The challenge of automatic evaluation of linguistic quality of summaries has also only recently started to be addressed. Identifying the most competitive approaches to summarization has also become more challenging, partly due to confusing or inconsistent evidence that comes from different test sets. Evaluating the same system configuration against several test sets will make possible a fairer comparison between methods and will further stimulate research on automatic evaluation metrics. For this workshop we invite submission on a wide range of topics related to evaluation and system comparison in summarization. Topics of interest include: + system comparison on several evaluation datasets. For example for multi-document summarization we will seek systems evaluated on multiple years of DUC/TAC data with emphasis on measuring statistically significant differences + manual evaluation protocols for summarization in new genres where existing methods may not apply + manual evaluation protocols for abstractive summarization, which assess the degree of text-to-text generation capabilities of the systems and rewards successful generation capabilities + automatic evaluation metrics of linguistic quality + automatic evaluation metrics that better reflect the differences in human and machine performance + automatic metrics that significantly outperform ROUGE in content selection evaluation for news summarization + automatic metrics that perform evaluation without the use of human gold standards + analysis of domain and genre difference that expose weaknesses of currently adopted evaluation metrics and proposals for addressing these weaknesses SUBMISSION Submissions will consist of regular full papers of up to 8 pages, plus additional pages for references. Shorter papers are also welcome. All papers should be formatted following the NAACL-HLT 2012 guidelines. As the reviewing will be blind, the paper must not include the authors' names and affiliations. Furthermore, self-references that reveal the author's identity, e.g., "We previously showed (Smith, 1991) ..." must be avoided. Instead, use citations such as "Smith previously showed (Smith, 1991) ..." We encourage individuals who are submitting papers on automatic methods for summarization and evaluation to evaluate their approaches using multiple publicly available datasets, such as those from DUC (http://duc.nist.gov/data.html) and the TAC Summarization track (http://www.nist.gov/tac/data/). Both submission and review processes will be handled electronically using the Softconf submission software: https://www.softconf.com/naaclhlt2012/WEAS2012/ The submission deadline is April 1, 2012 by 11:59PM Pacific Standard Time (GMT-8). IMPORTANT DATES April 1: Paper due date (EXTENDED Deadline) April 25: Notification of acceptance May 4: Camera-ready deadline June 8: Workshop at NAACL-HLT 2012 ORGANIZERS John Conroy (IDA Center for Computing Sciences) Hoa Dang (National Institute of Standards and Technology) Ani Nenkova (University of Pennsylvania) Karolina Owczarzak (National Institute of Standards and Technology) PROGRAM COMMITTEE Enrique Amigo (UNED, Madrid) Giuseppe Carenini (University of British Columbia) Katja Filippova (Google Research) George Giannakopoulos (NCSR Demokritos) Dan Gillick (University of California at Berkeley) Min-Yen Kan (National University of Singapore) Guy Lapalme (University of Montreal) Yang Liu (University of Texas, Dallas) Annie Louis (University of Pennsylvania) Kathy McKeown (Columbia University) Gabriel Murray (University of British Columbia) Dianne O'Leary (University of Maryland) Drago Radev (University of Michigan) Steve Renals (University of Edinburgh) Horacio Saggion (Universitat Pompeu Fabra) Judith Schlesinger (IDA Center for Computing Sciences) Josef Steinberger (European Commission Joint Research Centre) Stan Szpakowicz (University of Ottawa) Lucy Vanderwende (Microsoft Research) Stephen Wan (CSIRO ICT Centre) Xiaodan Zhu (National Research Council Canada) CONTACT Please contact us by email: weas2012@gmail.com