Improving ASR performance on Jasmin Flemish Dutch data by performing frequency perturbation
N. Sweijen (TU Delft - Electrical Engineering, Mathematics and Computer Science)
O.E. Scharenborg – Mentor (TU Delft - Multimedia Computing)
T.B. Patel – Mentor (TU Delft - Multimedia Computing)
Joana P. Gonçalves – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
ASR (automatic speech recognition) systems are used widely in our current day and age. However, for a technology that is used so much in our daily life it contains a lot of bias. This means that not all people can use it equally, people with a different gender, age and dialect will all see different results. The goal of this paper is to reduce this bias, in this case the dialect Flemish Dutch by increasing the performance of this dialect. Since collecting data is expensive, a data augmentation technique has been used. This technique has been used to increase the training data and lower the word error rate of this dialect. Frequency perturbation was used as the data augmentation technique. This technique amplifies or reduces the amplitude of certain frequency bands. We managed to improve upon the Flemish Dutch dialect slightly. Even though the dialect is still quite a bit worse compared to other Dutch dialects, it was improved nonetheless.