Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Information Technology Laboratory / Information Access Division

Multimodal Information Group

Metrics for Machine Translation Evaluation

Metrics for Machine Translation Evaluation (MetricsMaTr)

NIST coordinates MetricsMaTr, a series of research challenge events for machine translation (MT) metrology, promoting the development of innovative, even revolutionary, MT metrics. MetricsMaTr focuses entirely on MT metrics.

NIST provides the evaluation infrastructure, the source files being MT system output. The participants develop MT metrics to assess the quality of the source files. The metrics are run on the test set at NIST in two tracks, one using a single reference, one using multiple references.

The goal is to create intuitively interpretable automatic metrics which correlate highly with human assessment of MT quality. Different types of human assessment are used.

There are several drawbacks to the current methods employed for the evaluation of MT technology:

Automatic metrics have not yet been proved able to predict the usefulness and reliability of MT technologies with respect to real applications with confidence.
Automatic metrics have not demonstrated that they are meaningful in target languages other than English.
Human assessments are expensive, slow, subjective, and difficult to standardize.

These problems, and the need to overcome them through the development of improved automatic (or even semi-automatic) metrics, have been a constant point of discussion at past NIST MT evaluation events.

MetricsMaTr aims to provide a platform to address these shortcomings. Specifically, the goals of MetricsMaTr are:

To inform other MT technology evaluation campaigns and conferences with regard to improved metrology.
To establish an infrastructure that encourages the development of innovative metrics.
To build a diverse community which will bring new perspectives to MT metrology research.
To provide a forum for MT metrology discussion and for establishing future directions of MT metrology.

The MetricsMaTr challenge is designed appeal to a wide and varied audience including researchers of MT technology and metrology, acquisition programs such as MFLTS, and commercial vendors. We welcome submissions from a wide range of disciplines including computer science, statistics, mathematics, linguistics, and psychology. NIST encourages submissions from participants not currently active in the field of MT.

The most recent MetricsMaTr challenge was MetricsMaTr10.

Summary of Results

The MetricsMaTr evaluation tests automatic metric scores for correlation with human assessments of machine translation quality for a variety of languages, data genres, and human assessments. This leads to a large amount of results. Below, we provide a very high-level summary of these extensive results.

The table presents Spearman's rho correlations of automatic metric scores with human assessments on target language English data (stemming from NIST OpenMT, DARPA GALE, DARPA TRANSTAC test sets), limited to:

The highest-correlating new metric for each evaluation cycle
The highest-correlating baseline metric (out a suite of metrics available to NIST prior to MetricsMaTr08)
Correlation with human assessments of semantic adequacy on a 7-point scale

Highest correlation of automatic metrics with human assessments of semantic adequacy
Evaluation	1 reference translation			4 reference translations
Evaluation	Segment level	Document level	System level	Segment level	Document level	System level
MetricsMaTr10	SVM_rank rho=0.69	METEOR-next-rank rho=0.84	METEOR-next-rank rho=0.92	SVM_rank rho=0.74	i_letter_BLEU rho=0.85	SEPIA rho=0.93
MetricsMaTr08	TERp rho=0.68	METEOR-v0.7 rho=0.84	CDer rho=0.9	SVM_RANK rho=0.72	CDer rho=0.85	ATEC3 rho=0.93
Baseline	METEOR-v0.6 rho=0.68	NIST rho=0.81	TER-v0.7.25 rho=0.89	METEOR-v0.6 rho=0.72	NIST rho=0.84	NIST rho=0.93

Contact

mt_poc [at] nist.gov (mt_poc[at]nist[dot]gov)

Information technology and Metrology

Created January 20, 2011, Updated April 3, 2024

Was this page helpful?