Search results | TU Delft Repositories

document

Exploring Data Augmentation in Bias Mitigation Against Non-Native-Accented Speech

Zhang, Y. (author), Herygers, Aaricia (author), Patel, T.B. (author), Yue, Z. (author), Scharenborg, O.E. (author)

Automatic speech recognition (ASR) should serve every speaker, not only the majority “standard” speakers of a language. In order to build inclusive ASR, mitigating the bias against speaker groups who speak in a “non-standard” or “diverse” way is crucial. We aim to mitigate the bias against non-native-accented Flemish in a Flemish ASR system....

conference paper 2023

document

Evaluating the Use of Pitch Shifting to Improve Automatic Speech Recognition Performance on Southern Dutch Accents

Mešić, Amar (author)

Building Automatic Speech Recognizers (ASRs) has been a challenge in languages with insufficiently sized corpora or data sets. A further large issue in language corpora is biases against regionally accented speech and other speaker attributes. There are some techniques to improve ASR performance and reduce biases in these corpora, known as data...

bachelor thesis 2022

document

Improving Northern Regional Dutch Speech Recognition by Adapting Perturbation-based Data Augmentation

Zhlebinkov, Nikolay (author)

Automatic speech recognition (ASR) does not perform equally well on every speaker. There is bias against many attributes, including accent. To train Dutch ASR, there exists CGN(Corpus Gesproken Nederlands) and as an extension, the JASMIN corpus with annotated accented data. This paper focuses on improving ASR performance for NRAD (Northern...

bachelor thesis 2022

document

Evaluating the Use of Frequency Masking on a Hybrid Automatic Speech Recognizer for Transitional Dutch Accent of JASMIN-CGN Corpus

Bălan, Dragos (author)

There are many experiments conducted with Automatic Speech Recognition (ASR) systems, but many either focus on specific speaker categories or on a language in general. Therefore, bias could occur in such ASR systems towards different genders, age groups, or dialects. But, to analyze and reduce bias, the models require significant amounts of data...

bachelor thesis 2022

document

Evaluating the Effect of SpecSwap for Purposes of Improving WER Performance of the Western Dutch Region Using the JASMIN-CGN Dataset

Marinov, Alves (author)

A problem prevalent in many modern-day Automatic Speech Recognition (ASR) systems is the presence of bias and its reduction. Bias can be observed when an ASR system performs worse on a subset of its speakers compared to the rest rather than having the same overall generalization for everyone. This can be seen by using Word Error Rates (WER) as a...

bachelor thesis 2022

document

Mitigating bias against non-native accents

Zhang, Yuanyuan (author)

Automatic Speech Recognition (ASR) systems have seen substantial improvements in the past decade; however, not for all speaker groups. Recent research shows that bias exists against different types of speech, including non-native accents, in state-of-the-art (SOTA) ASR systems. To attain inclusive speech recognition, i.e., ASR for everyone...

master thesis 2022

document

Discovering phonetic inventories with crosslingual automatic speech recognition

Żelasko, Piotr (author), Feng, S. (author), Moro Velázquez, Laureano (author), Abavisani, Ali (author), Bhati, Saurabhchand (author), Scharenborg, O.E. (author), Hasegawa-Johnson, Mark (author), Dehak, Najim (author)

The high cost of data acquisition makes Automatic Speech Recognition (ASR) model training problematic for most existing languages, including languages that do not even have a written script, or for which the phone inventories remain unknown. Past works explored multilingual training, transfer learning, as well as zero-shot learning in order...

journal article 2022

document

Improving Automatic Speech Recognition For Dysarthric Speech

Prananta, Luke (author)

master thesis 2021

document

Signal-processing of audio for speech-recognition

de Jong, Joep (author)

The transcription of voice using neural networks is a technique that deserves attention, as speech assistants are becoming increasingly popular. Neural networks have often difficulty with determining the differences between a talking person and noise. Humans have a much better understanding of this and could possibly apply their knowledge of the...

bachelor thesis 2021

document

Assessing the performance of the TDNN-BLSTM architecture for phoneme recognition of English speech

Klom, Irene (author)

This research studies the Projected Bidirectional Long Short-Term Memory Time Delayed Neural Network (TDNN-BLSTM) model for English phoneme recognition. It contributes to the field of phoneme recognition by analyzing the performance of the TDNN-BLSTM model based on the TIMIT corpus and the Buckeye corpus, respectively containing read speech and...

bachelor thesis 2021

document

Evaluating the performance of TDNN-BLSTM on Mandarin read and spontaneous speech

Chiroşca, Mihail (author)

A limitation of current ASR systems is the so-called out-of-vocabulary words. The solution to overcome this limitation is to use APR systems. Previous research on Dutch APR systems identified Time Delayed Bidirectional Long-Short Term Memory Neural Network (TDNN-BLSTM) as one of best performing state-of-the-art NN architecture for PR. The goal...

bachelor thesis 2021

document

Phon Times: Improving Dutch phoneme recognition

Levenbach, Robert (author)

In this research, Dutch phoneme recognition (PR) is researched and improved. The last research on Dutch PR dates back to 1995. This research presents Dutch PR in modern daylight by researching state-of-the-art techniques found in research on other languages and implementing them on Dutch PR. The goal of this research is to find the current best...

master thesis 2021

document

That Sounds Familiar: an Analysis of Phonetic Representations Transfer Across Languages

Żelasko, Piotr (author), Moro-Velázquez, Laureano (author), Hasegawa-Johnson, Mark (author), Scharenborg, O.E. (author), Dehak, Najim (author)

Only a handful of the world’s languages are abundant with the resources that enable practical applications of speech processing technologies. One of the methods to overcome this problem is to use the resources existing in other languages to train a multilingual automatic speech recognition (ASR) model, which, intuitively, should learn some...

conference paper 2020

document

Computer-Based Social Anxiety Regulation in Virtual Reality Exposure Therapy

Hartanto, D. (author)

Social anxiety disorder (SAD), commonly referred to social phobia, is one of the most an immense and unreasonable fear of social interaction. Cognitive behaviour therapy (CBT) is the most thoroughly studied nonpharmacologic approach to the treatment of SAD patients. In CBT, patients are gradually, in <i>vivo</i>, exposed to anxiety-provoking...

doctoral thesis 2019

document

Command Recognition on Intermittently-Powered Devices

Schilder, Patrick (author)

The Internet of Things (IoT) is expected to include billions of tiny devices that collect, process, and communicate sensory data. As of now, batteries power these devices. Batteries, however, are large, expensive, and short-lived - even the rechargeable ones wear out in a few years. Therefore, they are not a sustainable powering solution. Tiny...

master thesis 2019

document

Limits on Modeling Compensation in Multimodal DNNs for Audio Visual Speech Recognition

Chandrasekharan Nair, Sreejith (author)

Speech is a natural way of communicating that does not require us to develop any new skills in order to be able to interact with electronic devices. With the evolution of technology, speech has become one of the primary means of communication. Speech recognition is a form of multimedia content analysis, where the information carried in a speech...

master thesis 2017

document

Distance measures for speech recognition

Hunt, M.J. (author), Lefèbvre, . (author)

This report is concerned with the application of aspects of statistical pattern classification to speech recognition. It presents an extension of linear discriminant analysis to the case where the classes are unknown. This extension provides solutions to the interrelated problems of the design of acoustic representations and spectral distance...

report 1989