Search results | TU Delft Repositories

document

Voice Based Interfaces for Supermarket robots using Large Language Models

Nandkumar, CHANDRAN (author)

This thesis presents the design and evaluation of a comprehensive system for developing voice-based interfaces to support users in supermarkets. These interfaces enable customers to convey their needs across both generic and specific queries. While current state-of-the-art systems like GPTs by OpenAI are easily accessible and adaptable,...

master thesis 2024

document

Natural User Interface in Augmented Reality to Control Spot: A Large Scale User Study on Speech and Gesture Control of Robots With The Microsoft HoloLens

van der Linden, Jesse (author)

The increasing presence of robots calls for a more seamless and information-rich communication method between humans and robots. This paper explores how natural user interface (NUI) modalities, particularly speech and gesture controls, can be used through augmented reality (AR) to operate robots. The increasing presence of robots calls for...

master thesis 2024

document

Improving End-to-End Models for Children’s Speech Recognition

Patel, T.B. (author), Scharenborg, O.E. (author)

Children’s Speech Recognition (CSR) is a challenging task due to the high variability in children’s speech patterns and limited amount of available annotated children’s speech data. We aim to improve CSR in the often-occurring scenario that no children’s speech data is available for training the Automatic Speech Recognition (ASR) systems....

journal article 2024

document

Applying Large-Scale Weakly Supervised Automatic Speech Recognition to Air Traffic Control

van Doorn, Jan Laurenszoon (author)

The application of automatic speech recognition in the air traffic control domain has been researched extensively. However, its primary application remains in the training and simulation of air traffic controllers. This is due to the insufficient performance of automatic speech recognition in specific environments, such as air traffic control,...

master thesis 2023

document

Improving whispered speech recognition using pseudo-whispered based data augmentation

Lin, Chaufang (author)

Whispering, characterized by its soft, breathy, and hushed qualities, serves as a distinct form of speech commonly employed for private communication and can also occur in cases of pathological speech. The acoustic characteristics of whispered speech differ substantially from normally phonated speech and the scarcity of adequate training data...

master thesis 2023

document

Automatic Speech Recognition for Air Traffic Control Using Open Data

Lubberding, Jari (author)

Air Traffic Control (ATC) is tasked with ensuring safe separation between aircraft in a given Controlled Traffic Region (CTR). To achieve this an Air Traffic Controller (ATCo) verbally gives clearances using over the air communication. These clearances are kept track of by the ATCo using so-called ‘flight-strips’, which in modern systems are...

master thesis 2023

document

Mitigating Regional Accent Bias in ASR Systems

Li, Zirui (author)

End-to-end Automatic Speech Recognition (ASR) systems improved drastically in recent years and they work extremely well on many large datasets. However, research shows that these models failed to capture the variability in speech production and have biases against the variant caused by the regional accented speech. Moreover, ASR research on...

master thesis 2023

document

Towards inclusive automatic speech recognition

Feng, S. (author), Halpern, B.M. (author), Kudina, O. (author), Scharenborg, O.E. (author)

Practice and recent evidence show that state-of-the-art (SotA) automatic speech recognition (ASR) systems do not perform equally well for all speaker groups. Many factors can cause this bias against different speaker groups. This paper, for the first time, systematically quantifies and finds speech recognition bias against gender, age, regional...

journal article 2023

document

Improving Adaptive Learning Models Using Prosodic Speech Features

Wilschut, Thomas (author), Sense, Florian (author), Scharenborg, O.E. (author), van Rijn, Hedderik (author)

Cognitive models of memory retrieval aim to describe human learning and forgetting over time. Such models have been successfully applied in digital systems that aid in memorizing information by adapting to the needs of individual learners. The memory models used in these systems typically measure the accuracy and latency of typed retrieval...

conference paper 2023

document

Improving Whispered Speech Recognition Performance Using Pseudo-Whispered Based Data Augmentation

Lin, Zhaofeng (author), Patel, T.B. (author), Scharenborg, O.E. (author)

Whispering is a distinct form of speech known for its soft, breathy, and hushed characteristics, often used for private communication. The acoustic characteristics of whispered speech differ substantially from normally phonated speech and the scarcity of adequate training data leads to low automatic speech recognition (ASR) performance. To...

conference paper 2023

document

The Multimodal Information Based Speech Processing (Misp) 2022 Challenge: Audio-Visual Diarization And Recognition

Wang, Zhe (author), Wu, Shilong (author), Chen, Hang (author), He, Mao-Kui (author), Du, Jun (author), Lee, Chin-Hui (author), Chen, Jingdong (author), Watanabe, Shinji (author), Siniscalchi, Sabato Marco (author), Scharenborg, O.E. (author), Liu, Diyuan (author)

The Multi-modal Information based Speech Processing (MISP) challenge aims to extend the application of signal processing technology in specific scenarios by promoting the research into wake-up words, speaker diarization, speech recognition, and other technologies. The MISP2022 challenge has two tracks: 1) audio-visual speaker diarization (AVSD),...

conference paper 2023

document

Exploring Data Augmentation in Bias Mitigation Against Non-Native-Accented Speech

Zhang, Y. (author), Herygers, Aaricia (author), Patel, T.B. (author), Yue, Z. (author), Scharenborg, O.E. (author)

Automatic speech recognition (ASR) should serve every speaker, not only the majority “standard” speakers of a language. In order to build inclusive ASR, mitigating the bias against speaker groups who speak in a “non-standard” or “diverse” way is crucial. We aim to mitigate the bias against non-native-accented Flemish in a Flemish ASR system....

conference paper 2023

document

Bias Mitigation Against Non-native Speakers in Dutch ASR

Zhang, Yixuan (author)

One of the most important problems that needs tackling for wide deployment of Automatic Speech Recognition (ASR) is the bias in ASR, i.e., ASRs tend to generate more accurate predictions for certain speaker groups while making more errors on speech from others. In this thesis, we aim to reduce bias against non-native speakers of Dutch compared...

master thesis 2022

document

Analyzing and comparing different self-supervised learning speech pre-trained models in the view of phonetics

Ji, Hang (author)

In this thesis, we analyzed and compared speech representations extracted from different frozen self-supervised learning (SSL) speech pre-trained models on their ability to capture articulatory feature (AF) information and their subsequent prediction of phone recognition performance in within-language and cross-language scenarios. Specifically,...

master thesis 2022

document

Evaluating the Use of Pitch Shifting to Improve Automatic Speech Recognition Performance on Southern Dutch Accents

Mešić, Amar (author)

Building Automatic Speech Recognizers (ASRs) has been a challenge in languages with insufficiently sized corpora or data sets. A further large issue in language corpora is biases against regionally accented speech and other speaker attributes. There are some techniques to improve ASR performance and reduce biases in these corpora, known as data...

bachelor thesis 2022

document

Improving Northern Regional Dutch Speech Recognition by Adapting Perturbation-based Data Augmentation

Zhlebinkov, Nikolay (author)

Automatic speech recognition (ASR) does not perform equally well on every speaker. There is bias against many attributes, including accent. To train Dutch ASR, there exists CGN(Corpus Gesproken Nederlands) and as an extension, the JASMIN corpus with annotated accented data. This paper focuses on improving ASR performance for NRAD (Northern...

bachelor thesis 2022

document

Evaluating the Use of Frequency Masking on a Hybrid Automatic Speech Recognizer for Transitional Dutch Accent of JASMIN-CGN Corpus

Bălan, Dragos (author)

There are many experiments conducted with Automatic Speech Recognition (ASR) systems, but many either focus on specific speaker categories or on a language in general. Therefore, bias could occur in such ASR systems towards different genders, age groups, or dialects. But, to analyze and reduce bias, the models require significant amounts of data...

bachelor thesis 2022

document

Evaluating the Effect of SpecSwap for Purposes of Improving WER Performance of the Western Dutch Region Using the JASMIN-CGN Dataset

Marinov, Alves (author)

A problem prevalent in many modern-day Automatic Speech Recognition (ASR) systems is the presence of bias and its reduction. Bias can be observed when an ASR system performs worse on a subset of its speakers compared to the rest rather than having the same overall generalization for everyone. This can be seen by using Word Error Rates (WER) as a...

bachelor thesis 2022

document

Mitigating bias against non-native accents

Zhang, Yuanyuan (author)

Automatic Speech Recognition (ASR) systems have seen substantial improvements in the past decade; however, not for all speaker groups. Recent research shows that bias exists against different types of speech, including non-native accents, in state-of-the-art (SOTA) ASR systems. To attain inclusive speech recognition, i.e., ASR for everyone...

master thesis 2022

document

Bringing digital scribes into orthopedic consultations: Towards AI-assisted clinical documentation

Magyari, Reka (author)

Clinical documentation takes up 40% of clinicians’ time. To ease the administrative burden of clinicians, digital scribes offer the potential to automate clinical note taking. Digital scribes are intelligent documentation softwares that combine automated speech recognition (ASR) and natural language processing (NLP). Digital scribes transcribe...

master thesis 2022

Pages

Pages