Evaluating the Use of Pitch Shifting to Improve Automatic Speech Recognition Performance on Southern Dutch Accents

Bachelor Thesis (2022)
Author(s)

A. Mešić (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

T.B. Patel – Mentor (TU Delft - Multimedia Computing)

O.E. Scharenborg – Mentor (TU Delft - Multimedia Computing)

Joana P. Gonçalves – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2022 Amar Mešić
More Info
expand_more
Publication Year
2022
Language
English
Copyright
© 2022 Amar Mešić
Graduation Date
22-06-2022
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Building Automatic Speech Recognizers (ASRs) has been a challenge in languages with insufficiently sized corpora or data sets. A further large issue in language corpora is biases against regionally accented speech and other speaker attributes. There are some techniques to improve ASR performance and reduce biases in these corpora, known as data augmentations. One audio data augmentation, pitch shifting, has had successes in other experiments for increasing ASR performance. Pitch shifting it is tested in this paper on the JASMIN-CGN speech data set from the Southern regions of the Netherlands. Using a hybrid GMM-HMM ASR, two baselines are developed, one using all speech data from the region, the other only using native speech. For the former ASR, pitch shifting is found to not improve Word Error Rate (WER) performance or reduce bias, but the latter succeeds in improving WER performance and reduced bias for certain speaker groups when augmented.

Files

RP_Amar_final_v2.pdf
(pdf | 0.694 Mb)
License info not available