Speech Intelligibility Demo

Project Description
The Public Safety Communications Research (PSCR) program, with support from Department of Homeland Security Science and Technology (DHS S&T), is studying the intelligibility of speech that is encumbered with background noise and then digitally encoded. The Listen section that follows provides some example MRT recordings from the study.

Audio coders/decoders (codecs) provide efficient (low data rate) digital representations of audio signals. When the signal is speech alone, a speech-specific signal model leads to efficient coding with good intelligibility. But when significant levels of background noise are combined with speech, broader or more robust signal models are required and these in turn typically require higher data rates. Thus one will expect to experience higher intelligibility for the examples that use higher bit-rates.

Audio Details and Samples
The Learn section provides a guide to the terms describing the samples. The Listen section presents audio samples you can play for various codecs, bands, and bit rates.

» See the resulting report: NTIA Technical Report TR-15-520 Speech Codec Intelligibility Testing in Support of Mission-Critical Voice Applications for LTE.

Learn

Codec

FM — Software simulation of analog frequency modulation (FM) land-mobile radio (full quieting signal, no interference).
P25-HR — Association of Public Safety Communications Officials International (APCO) Project 25 half-rate codec (AMBE+2,™ version 1.6).
P25-FR — APCO Project 25 full-rate codec (AMBE+2,™ version 1.6).
AMR — Adaptive Multi-Rate codec. From http://www.3gpp.org/DynaReport/26104.htm. ANSI-C code for the floating-point Adaptive Multi-Rate (AMR) speech codec, ETSI/3GPP TS 26.104, Rev. 10.0.0, Apr. 2011.
AMR-WB — Adaptive Multi-Rate Wideband codec. From http://www.3gpp.org/DynaReport/26204.htm. Speech codec speech processing functions; Adaptive Multi-Rate - Wideband (AMR-WB) speech codec; ANSI-C code, European Telecommunications Standards Institute (ETSI)/3rd Generation Partnership Project (3GPP) TS 26.204, Rev. 10.0.0, Apr. 2011.
EVS — 3GPP Enhanced Voice Services Codec. From http://www.3gpp.org/DynaReport/26442.htm, Codec for Enhanced Voice Services (EVS); ANSI-C code (fixed-point) 26442-c00-ANSI-C_source_code.zip, (Version 12.0.0) Sept. 2014.
Opus — Internet Engineering Task Force (IETF) Opus Interactive Audio Codec. From http://www.opus-codec.org, libopus 1.1, Oct. 2014.

Bit Rate
Raw data rate required between encoder and decoder in bits per second. Does not include any forward error correction. No bit errors or packet loss.

Audio Bandwidth

Narrowband — Supported audio bandwidth has a nominal upper limit between 3 and 4 kHz.
Wideband — Supported audio bandwidth has a nominal upper limit between 7 and 8 kHz.
Fullband — Supported audio bandwidth has a nominal upper limit near 20 kHz.

Noise

Club — Speech combined with recorded nightclub noise at 0 dB signal-to-noise ratio (SNR).
Siren — Speech combined with recorded fire truck siren at 0 dB SNR.
Quiet — No extra noise added to speech.
All — Contains club, siren, and quiet versions, in that order.

The demonstration provided here includes just club and siren noise. The study actually includes six noises. To listen to all six without speech, choose Play to hear 5 seconds of alarm, club, coffeehouse, nozzle, saw, and siren noise, in that order.

Speech

Each audio sample in this demonstration includes this speech in this order:

Female speaker — “Please select the word pin.”
Male speaker — “Please select the word hark.”
Female speaker — “Please select the word wig.”
Male speaker — “Please select the word tip.”

Disclaimer
Certain commercial products, organizations, and companies are identified in this audio demonstration material to specify adequately the technical aspects of the available audio files. In no case does such identification imply recommendation or endorsement by PSCR, National Institute of Standards and Technology Communication Technology Laboratory (NIST CTL), or National Telecommunications and Information Administration Institute for Telecommunication Sciences (NTIA ITS), nor does it imply that the products, organizations, or companies identified are necessarily the best available for the particular application or use.

Listen

Audio Samples
Codex	Bit Rate	Narrowband	Wideband	Fullband
FM	Not Applicable	Club, Siren, Quiet, All	Not Applicable	Not Applicable
P25-HR	2,450	Club, Siren, Quiet, All	Not Applicable	Not Applicable
P25-FR	4,400	Club, Siren, Quiet, All	Not Applicable	Not Applicable
AMR	4,750	Club, Siren, Quiet, All	Not Applicable	Not Applicable
EVS	5,900	Club, Siren, Quiet, All	Club, Siren, Quiet, All	Not Applicable
Opus	5,900	Club, Siren, Quiet, All	Club, Siren, Quiet, All	Not Applicable
AMR	12,200	Club, Siren, Quiet, All	Not Applicable	Not Applicable
AMR-WB	12,650	Not Applicable	Club, Siren, Quiet, All	Not Applicable
EVS	13,200	Club, Siren, Quiet, All	Club, Siren, Quiet, All	Not Applicable
Opus	13,200	Club, Siren, Quiet, All	Club, Siren, Quiet, All	Not Applicable
AMR-WB	23,850	Not Applicable	Club, Siren, Quiet, All	Not Applicable
EVS	24,400	Club, Siren, Quiet, All	Club, Siren, Quiet, All	Club, Siren, Quiet, All
Opus	24,400	Club, Siren, Quiet, All	Club, Siren, Quiet, All	Club, Siren, Quiet, All

Electronics and Public safety

Created September 21, 2016, Updated August 6, 2024

Public Safety Communications Research Division

Speech Intelligibility Demo