NOTICE: Due to a lapse in annual appropriations, most of this website is not being updated. Learn more.
Form submissions will still be accepted but will not receive responses at this time. Sections of this site for programs using non-appropriated funds (such as NVLAP) or those that are excepted from the shutdown (such as CHIPS and NVD) will continue to be updated.
An official website of the United States government
Here’s how you know
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
Secure .gov websites use HTTPS
A lock (
) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.
Seyed Omid Sadjadi, Timothée N. Kheyrkhah, Audrey N. Tong, Craig S. Greenberg, Douglas A. Reynolds, Elliot Singer, Lisa Mason, Jaime Hernandez-Cordero
Abstract
In 2016, NIST conducted the most recent in an ongoing series of speaker recognition evaluations (SRE) to foster research in robust text-independent speaker recognition, as well as measure performance of the current state-of-the-art systems, targeting in particular domain and language mismatch scenarios. Compared to the previous SREs, the 2016 evaluation introduced several new aspects such as i) an entirely online evaluation platform, ii) using fixed and specified training data, iii) a wider range of durations for test segments (uniformly distributed between 10s and 60s), and iv) providing labeled and unlabeled development (a.k.a. validation) sets for system hyperparameter tuning. Both the development and evaluation sets contained conversational telephony speech (CTS) collected outside North America, spoken in Tagalog and Cantonese (referred to as the major languages) as well as Cebuano and Mandarin (referred to as the minor languages). A total of 66 research organizations (from industry and academia) registered for the 2016 SRE, out of which 43 teams submitted 121 valid system outputs that produced scores. The evaluation results indicated a significant impact on performance due to several factors including domain/channel, language, and duration mismatch. Effective use of the labeled and unlabeled development sets seemed to be essential for many top-performing systems. Finally, although mega fusion systems achieved the best performance, top single systems yielded 90% of the performance.
, S.
, Kheyrkhah, T.
, Tong, A.
, Greenberg, C.
, Olson, D.
, Singer, E.
, Mason, L.
and Hernandez-Cordero, J.
(2017),
The 2016 NIST Speaker Recognition Evaluation, Interspeech 2017, Stockholm, -1, [online], https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=922849
(Accessed October 7, 2025)