Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

OpenASR Challenge

The goal of the OpenASR (Open Automatic Speech Recognition) Challenge is to assess the state of the art of ASR technologies for low-resource languages.

The OpenASR Challenge is an open challenge created out of the IARPA (Intelligence Advanced Research Projects Activity) MATERIAL (Machine Translation for English Retrieval of Information in Any Language) program that encompasses more tasks, including CLIR (cross-language information retrieval), domain classification, and summarization. For every year of MATERIAL, NIST supports a simplified, smaller scale evaluation open to all, focusing on a particular technology aspect of MATERIAL. CLIR technologies were the focus of the first open challenge in 2019, OpenCLIR. Since 2020, the focus has been on ASR. The capabilities tested in the open challenges are expected to ultimately support the MATERIAL task of effective triage and analysis of large volumes of text and audio content in a variety of less-studied languages.

 

OpenASR21 Challenge

The second OpenASR Challenge associated with MATERIAL, OpenASR21, opened for registration August 9, 2021. The evaluation period in November 2021. OpenASR21 features ASR evaluation opportunities for 15 low-resource languages:

  • All ten languages from OpenASR20
  • Five NEW languages for OpenASR21

For the languages from OpenASR20, the same evaluation datasets from 2020 will be used, consisting of conversational telephone speech (CTS) data. For the five new languages, the main evaluation dataset will also consist of CTS data. These datasets will be scored (where applicable) case-insensitively.

NEW for OpenASR21 will be case-sensitive scoring for three of the new languages, as indicated below. Case-sensitive scoring will be performed for system output from separate evaluation datasets from a mix of genres for these languages, in order to assess low-resource ASR performance specifically on proper nouns.

OpenASR21 languages:

  • Amharic
  • Cantonese
  • NEW Georgian
  • NEW Farsi
  • Guarani
  • Javanese
  • NEW Kazakh (including additional evaluation dataset for case-sensitive scoring)
  • Kurmanji Kurdish
  • Mongolian
  • Pashto
  • Somali
  • NEW Swahili (including additional evaluation dataset for case-sensitive scoring)
  • NEW Tagalog (including additional evaluation dataset for case-sensitive scoring)
  • Tamil
  • Vietnamese

OpenASR21 will be implemented as a track of NIST’s OpenSAT (Open Speech Analytic Technologies) evaluation series, using the OpenSAT web server for registration, data access, submission, and scoring purposes.

For more details, please refer to the OpenASR21 Challenge Evaluation Plan in the Documentation and Resources section below.

Schedule

Milestone

Date

Evaluation plan release

July 2021

Registration period

August 9 – October 15, 2021

Development period

August 9 – November 2, 2021 (potentially longer but excluding evaluation period)

- Build and Dev datasets release

August 9, 2021

- Scoring server accepts submissions for Dev datasets

August 30 – November 2, 2021 (potentially longer but excluding evaluation period)

Registration closes

October 15, 2021

Evaluation period

November 3 – 10, 2021

- Release of Eval datasets

November 3, 2021

- Scoring server accepts submissions

November 4 – 10, 2021

- System output due at NIST

November 10, 2021

System description due at NIST

November 19, 2021

Registration

Registration opened on August 9, 2021. Please register via the OpenSAT web server.

Documentation and Resources

 

OpenASR20 Challenge

The first OpenASR Challenge associated with MATERIAL, OpenASR20, was opened for registration in July 2020, with an evaluation period in November 2020. It featured ASR evaluation opportunities for these ten low-resource languages:

  • Amharic
  • Cantonese
  • Guarani
  • Javanese
  • Kurmanji Kurdish
  • Mongolian
  • Pashto
  • Somali
  • Tamil
  • Vietnamese

It was implemented as a track of NIST’s OpenSAT (Open Speech Analytic Technologies) evaluation series, using the OpenSAT web server for registration, data access, submission, and scoring purposes.

The evaluation plan posted in the Documentation and Resources section below describes the OpenASR20 Challenge in detail.

Registration

Registration for the OpenASR20 Challenge is now closed.

Documentation and Resources

Results

The OpenASR20 evaluation was conducted in November 2020. Please see the OpenASR20 Challenge Results page.

OpenASR20 participants, as well as others working in the low-resource ASR problem space, are strongly encouraged to submit their work to an OpenASR special session at INTERSPEECH 2021. Please see the OpenASR20 and Low-Resource ASR Special Session at INTERSPEECH 2021 Call for Papers. This special session also welcomes contributions from others working in the low-resource ASR problem space who did not participate in OpenASR20.

Contact

Please email openasr_poc [at] nist.gov for any questions or comments regarding the OpenASR Challenge. 

Created June 10, 2020, Updated September 30, 2021