Improving whispered speech recognition using pseudo-whispered based data augmentation

Lin, Chaufang

Improving whispered speech recognition using pseudo-whispered based data augmentation

Title

Improving whispered speech recognition using pseudo-whispered based data augmentation

Author

Lin, Chaufang (TU Delft Electrical Engineering, Mathematics and Computer Science; TU Delft Multimedia Computing)

Contributor

Scharenborg, O.E. (mentor)
Dauwels, J.H.G. (graduation committee)
Patel, T.B. (graduation committee)

Degree granting institution

Delft University of Technology

Programme

Electrical Engineering

Date

2023-08-29

Abstract

Whispering, characterized by its soft, breathy, and hushed qualities, serves as a distinct form of speech commonly employed for private communication and can also occur in cases of pathological speech. The acoustic characteristics of whispered speech differ substantially from normally phonated speech and the scarcity of adequate training data leads to low automatic speech recognition (ASR) performance. This project aims to build an ASR system that can recognize both normal and whispered speech and discover which acoustic characteristics of whispered speech have an impact on whispered speech recognition.
In my study, I use signal processing techniques that transform the spectral characteristics of normal speech to those of pseudo-whispered speech, called pseudo-whispered-based data augmentation. I enhance an End-to-End ASR system by incorporating pseudo-whispered speech and state-of-the-art (SOTA) data augmentation methods, speed perturbation and SpecAugment, yielding an 18.2\% relative reduction in word error rate compared to the strongest baseline.
Results for the accented speaker groups in the wTIMIT database show the best results for US English. Further investigation uncovers that the lack of pitch in whispered speech has the largest impact on the performance of whispered speech ASR.

Subject

automatic speech recognition
whispered speech
pseudo-whisper
signal processing

To reference this document use:

http://resolver.tudelft.nl/uuid:5f51b210-c2b5-4093-8ec3-7b6ed5bfc5c7

Part of collection

Student theses

Document type

master thesis

Rights

Files

PDF

TU_Delft_thesis_Chaufang.pdf

1.53 MB

Close viewer