This paper presents the results of the WMT10 and MetricsMATR10 shared tasks, which included a translation task, a system combination task, and an evaluation task. We conducted a large-scale manual evaluation of 104 machine translation systems and 41 system combination entries. We used the ranking of these systems to measure how strongly auto- matic metrics correlate with human judgments of translation quality for 26 metrics. This year we also investigated increasing the number of human judgments by hiring non-expert annotators through Amazons Mechanical Turk.
Proceedings Title: ACL 2010 Joint Fifth Workshop on Statistical Machine Translation and MetricsMaTr
Conference Dates: July 15-16, 2010
Conference Location: Uppsala, -1
Pub Type: Conferences
machine translation, mt, evaluation, metrology