Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Information Technology Laboratory / Information Access Division

Multimodal Information Group

OpenSAT

Open Speech Analytic Technologies (OpenSAT) Evaluation Series

The OpenSAT20 reopened June 18th and the schedule has been updated.

The next evaluation will be the OpenSAT20 Evaluation. OpenSAT20 will continue with the same tasks: Automatic Speech Recognition (ASR), Speech Activity Detection (SAD), and Keyword Search (KWS).

The data domain for the OpenSAT20 Evaluation will be simulated public safety communications spoken in English. The evaluation data will be extracted from unexposed portions of the SAFE-T corpus that was collected by the Linguistic Data Consortium (LDC) and initially made available for the OpenSAT19 Evaluation. The audio recordings in the SAFE-T corpus contain speech potentially with increased vocal effort induced by first-responder type background noise conditions and is expected to be challenging for systems to process with a high degree of accuracy.

NIST intends to continue with this public-safety speech corpus in the OpenSAT series to measure year-to-year system performance progress. The NIST Speech Analytic Technologies evaluation series (OpenSAT) goal is to provide broad support for the advancement of speech analytic technologies by including multiple speech analytic tasks and multiple data domains. Developers can choose from one to all tasks and from one to all data domains.

OpenSAT20 will be organized in similar manner to OpenSAT19 except as follows:

Only one data domain (public safety communications) will be available (as opposed to three domains).
The evaluation data package will consist of unexposed eval data (denoted as Test set) plus OpenSAT19 eval data (denoted as Progress set).
Live public leaderboards will display scores for the Progress set for each task, and participants will have the option to anonymize their team names on the public leaderboard.
System descriptions will be available to all participants from the OpenSAT20 web site.
For those who cannot physically attend the workshop, provisions will be made to attend remotely.

Current 2020 Schedule - updated

OpenSAT20 registration available until July 31, 2020
Training, Development, and Evaluation data available until July 31
Scores posted to leaderboards for the Progress set portion of the evaluation data until July 31
Last date to upload system output to NIST for scoring is July 31
Scores for the Test set portion of the evaluation data made available after July 31 (if a system description is uploaded)
Virtual Workshop will be held on September 16-17, 2020. Click HERE to register. Last day to register is September 04

Go to the OpenSAT website for more information and to register.

OpenSAT20 Objectives:

for NIST to measure performance of the state-of-the-art speech analytics technologies in the public safety communications domain,
to provide a forum for the speech analytics community to further test and develop multiple technologies in parallel with other developers using a common data set, and
to enable opportunities for sharing and leveraging knowledge by bringing together developers whose primary focus may be the same analytic task or one of the other two tasks.

Download the 2020 OpenSAT Evaluation Plan V1.6 (pdf). Updated July 1, 2020

Send email to opensat_poc [at] nist.gov (opensat_poc[at]nist[dot]gov) with request to be added to the mailing list, to receive updates, or to ask questions or leave comments.

OpenSAT19 (schedule updated 2/06/19)

OpenSAT19 Evaluation Plan (Updated 3/28/2019)
03/29/2018 - 06-14-2019 Development data release (updated dates)
06/17/2019 - 07/01/2019 Evaluation data release (updated date)
08/20/2019 - 08/21/2019 Post Evaluation Workshop

Tasks
Speech Activity Detection (SAD)
Automatic Speech Recognition (ASR)
Key Word Search (KWS)

Data
For SAD, ASR, KWS tasks Low Resource Language - (Pashto language) from the IARPA Babel collection
For SAD, KWS tasks Audio extracted from amateur online videos - from the Video Annotation for Speech Technologies (VAST) collection (English language)
For SAD, ASR, KWS tasks Simulated public safety communications - from the PSC collection (English language)

OpenSAT Pilot 2017

Tasks
Speech Activity Detection (SAD)
Automatic Speech Recognition (ASR)
Key Word Search (KWS)

Data
For SAD, ASR, KWS tasks Low Resource Language - from the IARPA Babel collection (Pashto language)
For SAD task only Audio extracted from YouTube videos - from the Video Annotation for Speech Technologies (VAST) collection (Arabic, Mandarin and English languages)
For SAD, ASR, KWS tasks First responder/dispatcher operational recordings - from the June 18th 2007, Charleston, South Carolina, Sofa Super Store Fire (English language)

Documentation
Open Speech Analytic Technologies Pilot (OpenSAT Pilot) Evaluation Plan
Open Speech Analytic Technologies Pilot (OpenSAT Pilot) Evaluation Report

Created September 30, 2016, Updated August 28, 2020

Was this page helpful?