Evaluating the Use of Frequency Masking on a Hybrid Automatic Speech Recognizer for Transitional Dutch Accent of JASMIN-CGN Corpus

None, None

Evaluating the Use of Frequency Masking on a Hybrid Automatic Speech Recognizer for Transitional Dutch Accent of JASMIN-CGN Corpus

Bachelor Thesis (2022)

Author(s)

D.A. Bălan (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

T.B. Patel – Mentor (TU Delft - Multimedia Computing)

Odette Scharenborg – Mentor (TU Delft - Multimedia Computing)

Joana P. P. Gonçalves – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

ASR JASMIN-CGN Audio augmentation Bias Speech recognition Hybrid ASR Dutch Speech augmentation SpecAugment

To reference this document use:

https://resolver.tudelft.nl/uuid:a410e9f6-12ac-41be-b415-367e2f7243a3

More Info

expand_more

Publication Year

2022

Language

English

Copyright

Graduation Date

22-06-2022

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

There are many experiments conducted with Automatic Speech Recognition (ASR) systems, but many either focus on specific speaker categories or on a language in general. Therefore, bias could occur in such ASR systems towards different genders, age groups, or dialects. But, to analyze and reduce bias, the models require significant amounts of data to be trained on, and some corpora lack that. This is where augmentation techniques can be used to generate more unique data without any further collection of it. This paper explores the use of SpecAugment's frequency masking on such a corpus, JASMIN-CGN, for the Transitional regional accent of Dutch, with a hybrid GMM-HMM architecture, in order to reduce the bias for gender or age, for this specific dialect. The experiments show that SpecAugment does not manage to lower the WER (20.8% overall compared to the baseline model, which achieves 19.5% performance), on the contrary, it even increases the bias for age. The results are mainly attributed to the combination of low amounts of data + the hybrid architecture used, which proves SpecAugment to be a useful augmentation policy only for end-to-end models.

Files

Research_paper_Dragos_Final.pd... (pdf)

(pdf | 1.71 Mb)

License info not available