Evaluation of Machine Translation (MT) technology is often tied to the requirement for tedious manual judgments of translation quality. While automated MT metrology continues to be an active area of research, a well known and often accepted standard metric is the manual human assessment of adequacy and fluency. There are several software packages (RWTH, 2000) (LDC, 2005) that have been used to facilitate these judgments, but for the 2008 NIST Open MT Evaluation (NIST, 2008), NIST s Speech Group created an online software tool to accommodate the requirement for centralized data and distributed judges. This paper introduces the NIST TAP-ET application and reviews the reasoning underlying its design. Where available, analysis of data sets judged for Adequacy and Preference using the TAP-ET application will be presented. TAP-ET is freely available and ready to download, and contains a variety of customizable features.
May 28-30, 2008
The sixth international conference on Language Resources and Evaluation, LREC 2008
Machine Translation Evaluation, Evaluation Software, Human Assessments, Human Judgments, Adequacy, Preference