Improving Whispered Speech Recognition Performance Using Pseudo-Whispered Based Data Augmentation

Conference Paper (2023)
Author(s)

Zhaofeng Lin (Student TU Delft)

T.B. Patel (TU Delft - Multimedia Computing)

O.E. Scharenborg (TU Delft - Multimedia Computing)

Multimedia Computing
Copyright
© 2023 Zhaofeng Lin, T.B. Patel, O.E. Scharenborg
DOI related publication
https://doi.org/10.1109/ASRU57964.2023.10389801
More Info
expand_more
Publication Year
2023
Language
English
Copyright
© 2023 Zhaofeng Lin, T.B. Patel, O.E. Scharenborg
Multimedia Computing
ISBN (print)
979-8-3503-0690-3
ISBN (electronic)
979-8-3503-0689-7
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Whispering is a distinct form of speech known for its soft, breathy, and hushed characteristics, often used for private communication. The acoustic characteristics of whispered speech differ substantially from normally phonated speech and the scarcity of adequate training data leads to low automatic speech recognition (ASR) performance. To address the data scarcity issue, we use a signal processing-based technique that transforms the spectral characteristics of normal speech to those of pseudo-whispered speech. We augment an End-to-End ASR with pseudo-whispered speech and achieve an 18.2 % relative reduction in word error rate for whispered speech compared to the baseline. Results for the individual speaker groups in the wTIMIT database show the best results for US English. Further investigation showed that the lack of glottal information in whispered speech has the largest impact on whispered speech ASR performance.

Files

Improving_Whispered_Speech_Rec... (pdf)
(pdf | 0.552 Mb)
- Embargo expired in 19-07-2024
License info not available