Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

The 2019 NIST Speaker Recognition Evaluation CTS Challenge

Published

Author(s)

Seyed Omid Sadjadi, Craig S. Greenberg, Elliot Singer, Douglas Reynolds, Lisa Mason, Jaime Hernandez-Cordero

Abstract

In 2019, the U.S. National Institute of Standards and Technology (NIST) conducted a leaderboard style speaker recognition challenge using conversational telephone speech (CTS) data extracted from the unexposed portion of the Call My Net 2 (CMN2) corpus previously used in the 2018 Speaker Recognition Evaluation (SRE). The SRE19 CTS Challenge was organized in a similar manner to SRE18, except it offered only the open training condition. In addition, similar to the NIST i-vector challenge, the evaluation set consisted of two subsets: a progress subset, and a test subset. The progress subset comprised 30% of the trials and was used to monitor progress on the leaderboad, while the remaining 70% of the trials formed the test subset, which was used to generate the official final results determined at the end of the challenge. Which subset (i.e., progress or test) a trial belonged to was unknown to challenge participants, and each system submission had to contain outputs for all of trials. The CTS Challenge also served as a prerequisite for entrance to the main SRE19 whose primary task was audio-visual person recognition. A total of 67 organizations (forming 51 teams) from academia and industry participated in the CTS Challenge and submitted 1347 valid system outputs. This paper presents an overview of the evaluation and several analyses of system performance for all primary conditions in the CTS Challenge. Compared to the CTS track of the SRE18, the SRE19 CTS Challenge results indicate remarkable improvements in performance which are mainly attributed to 1) the availability of large amounts of in-domain development data from a large number of labeled speakers, 2) speaker representations (aka embeddings) extracted using extended and more complex end-to-end neural network frameworks, and 3) effective use of the provided large development set.
Conference Dates
November 1-5, 2020
Conference Location
Tokyo
Conference Title
The Speaker and Language Recognition Workshop: Odyssey 2020

Keywords

Artificial intelligence, human language technology, NIST SRE, speaker recognition, speaker verification, statistical analysis

Citation

, S. , Greenberg, C. , Singer, E. , Reynolds, D. , Mason, L. and Hernandez-Cordero, J. (2020), The 2019 NIST Speaker Recognition Evaluation CTS Challenge, The Speaker and Language Recognition Workshop: Odyssey 2020, Tokyo, -1, [online], https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=929506 (Accessed March 18, 2024)
Created May 18, 2020, Updated March 27, 2020