Augmenting Deep Learning Models for Speech Emotion Recognition

Ram Sriram; Dinesh Manocha; Sarala Padi

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

PUBLICATIONS

Augmenting Deep Learning Models for Speech Emotion Recognition

Published

October 19, 2020

Author(s)

Ram Sriram, Dinesh Manocha, Sarala Padi

Abstract

We present a Multi-Window Data Augmentation (MWA-SER) approach for speech emotion recognition. MWA-SER is a unimodal approach that focuses on two key concepts; designing the speech augmentation method and building the deep learning model to recognize the underlying emotion of an audio signal. Our proposed multi-window augmentation approach generates additional data samples from the speech signal by employing multiple window sizes in the audio feature extraction process. We show that our augmentation method, combined with a deep learning model, improves speech emotion recognition performance. We evaluate the performance of our approach on three benchmark datasets: IEMOCAP, SAVEE, and RAVDESS. We show that the multi-window model improves the SER performance and outperforms a single-window model. The notion of finding the best window size is an essential step in audio feature extraction. We perform extensive experimental evaluations to find the best window choice and explore the windowing effect for SER analysis.

Citation

Arxiv

Pub Weblink

https://arxiv.org/

Pub Type

Websites

Download Paper

Local Download

Keywords

Artificial Intelligence, Speech Recognition, Emotion, Testing, Evaluation

Information technology and Artificial intelligence

Citation

Sriram, R. , Manocha, D. and Padi, S. (2020), Augmenting Deep Learning Models for Speech Emotion Recognition, Arxiv, [online], https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=931063, https://arxiv.org/ (Accessed January 9, 2026)

Issues

If you have any questions about this publication or are having problems accessing it, please contact [email protected].

Created October 19, 2020, Updated September 29, 2025

Was this page helpful?

Augmenting Deep Learning Models for Speech Emotion Recognition

Author(s)

Abstract

Download Paper

Keywords

Citation

Additional citation formats

Issues