TAC 2008 Opinion Summarization Task Guidelines
II. Test data
III. Submission guidelines
The goal of the TAC Summarization track is to foster research on systems that produce summaries of documents. The focus is on systems that can produce well-organized, fluent summaries of text.
The 2008 Opinion Summarization pilot task is to generate well-organized, fluent summaries of opinions about specified targets, as found in a set of blog documents. Similar to past query-focused summarization tasks, each summary will be focused by a number of complex questions about the target, where the question cannot be answered simply with a named entity (or even a list of named entities). The input to the summarization task will come from the TAC 2008 QA task and will comprise a target, some "squishy list" questions about the target, and a set of documents that contain answers to the questions. The output will be a summary for each target that summarizes the answers to the questions. Rather than evaluating content against a set of model summaries, each submitted summary will be evaluated against a nugget Pyramid created during the evaluation of submissions to the QA task.
Much of the test data and evaluation metrics for the opinion summarization task will be in common with the opinion QA task. For details about test questions, documents, and "squishy list" evaluation metrics, opinion summarization participants are invited to read the following:
The test questions for the opinion summarization task will be available on the TAC 2008 Summarization home page on August 22. Submissions are due at NIST on or before September 2, 2008. Each team may submit up to three runs (submissions) for the opinion summarization pilot task, ranked by priority. NIST will judge the first-priority run from each team and (if resources allow) up to 2 additional runs from each team. Runs may be either manual or automatic.
II. Test Data
The test questions and documents will be a subset of the test data for the TAC 2008 QA task. The opinion summarization test data will consist of:
III. Submission guidelines
A submission to the opinion summarization task will comprise exactly one file per summary, where the name of each summary file is the numeric ID of the target of the summary. Please include a file for each summary, even if the file is empty. The number of non-whitespace characters in the summary must not exceed 7000 times the number of squishy list questions for the target of the summary. Each file will be read and assessed as a plain text file, so no special characters or markups are allowed. The files must be in a directory whose name should be the concatenation of the Team ID and the priority of the run. (For example, if the Team ID is "SYSX" then the directory name for the first-priority run should be "SYSX1".) Please package the directory in a tarfile and gzip the tarfile before submitting it to NIST.
Each team may submit up to three runs, ranked by priority (1-3). NIST will evaluate the first-priority run from each team. If resources allow, NIST will evaluate an additional 1 or 2 runs from each team.
NIST will post the test data on the TAC Summarization web site on August 22 and results will have to be submitted to NIST by 11:59 p.m. (EDT) on September 2, 2008. Results are submitted to NIST using an automatic submission procedure. Details about the submission procedure will be emailed to the email@example.com mailing list when the test data is released. At that time, NIST will release a routine that checks for common errors in submission files including such things as invalid ID, missing summaries, etc. Participants should check their runs with this script before submitting them to NIST because the automatic submission procedure will reject the submission if the script detects any errors.
Submissions may be either manual or automatic. For automatic runs, no changes can be made to any component of the summarization system or any resource used by the system in response to the current year's test data (targets, questions, or documents). If any part of the system (including resources used) is changed or tuned in response to the current year's test data, then the resulting run must be classified as a manual run. At the time of submission, each team will be asked to fill out a form stating:
Rather than evaluating content against a set of model summaries, each summary will be evaluated for content using the nuggets Pyramid method used to evaluate the squishy list questions in the TAC QA task. The assessor will use the list(s) of acceptable nuggets previously created for the question(s) in the QA track and count the nuggets contained in each summary. Each nugget that is present will be counted only once. Scoring will be the same as for the QA squishy list score, but likely with a lower value for beta (i.e., recall will be weighted less heavily than in the QA task).
The assessor will also give an overall responsiveness score to each summary. The overall responsiveness score will be an integer between 1-10 (10 being best) and will reflect both content and linguistic quality. NIST will use the overall responsiveness score to determine appropriate parameters for scoring, including an appropriate value for beta.
NIST is an agency of the
U.S. Department of Commerce
Last updated: Tuesday, 19-Oct-2010 11:02:53 EDT
Comments to: firstname.lastname@example.org