Take a sneak peek at the new NIST.gov and let us know what you think!
(Please note: some content may not be complete on the beta site.).
NIST Authors in Bold
|Author(s):||Gregory A. Sanders; Sherri Condon; Mark Arehart; Dan Parvaz; Christy Doran; John Aberdeen;|
|Title:||Evaluation of 2-Way Iraqi Arabic-English Speech Translation Systems Using Automated Metrics|
|Published:||September 22, 2011|
|Abstract:||The Defense Advanced Research Projects (DARPA) Spoken Language Communication and Translation System for Tactical Use (TRANSTAC) program faced many challenges in applying automated measures of translation quality to Iraqi Arabic-English speech translation dialogues. Features of speech data in general and of Iraqi Arabic data in particular undermine basic assumptions of automated measures that depend on matching system outputs to reference translations. We show that scores for translation into Iraqi Arabic exhibit higher correlations with human judgments when they are computed from normalized system outputs and reference translations. Orthographic normalization, lexical normalization, and operations involving light stemming resulted in higher correlations with human judgments. Another challenge for use of automated metrics in the TRANSTAC program was the relatively small amount of test data available for evaluation. We present evidence that the datasets of 500-600 utterances for each language which we used to evaluate the systems are adequate for scoring and comparing among different systems.|
|Pages:||pp. 159 - 176|
|Keywords:||Arabic, machine translation, evaluation, automated metrics, speech translation|
|Research Areas:||Information Technology|
|PDF version:||Click here to retrieve PDF version of paper (2MB)|