Search results | TU Delft Repositories

document

Improving End-to-End Models for Children’s Speech Recognition

Patel, T.B. (author), Scharenborg, O.E. (author)

Children’s Speech Recognition (CSR) is a challenging task due to the high variability in children’s speech patterns and limited amount of available annotated children’s speech data. We aim to improve CSR in the often-occurring scenario that no children’s speech data is available for training the Automatic Speech Recognition (ASR) systems....

journal article 2024

document

Applying Large-Scale Weakly Supervised Automatic Speech Recognition to Air Traffic Control

van Doorn, Jan Laurenszoon (author)

The application of automatic speech recognition in the air traffic control domain has been researched extensively. However, its primary application remains in the training and simulation of air traffic controllers. This is due to the insufficient performance of automatic speech recognition in specific environments, such as air traffic control,...

master thesis 2023

document

Improving whispered speech recognition using pseudo-whispered based data augmentation

Lin, Chaufang (author)

Whispering, characterized by its soft, breathy, and hushed qualities, serves as a distinct form of speech commonly employed for private communication and can also occur in cases of pathological speech. The acoustic characteristics of whispered speech differ substantially from normally phonated speech and the scarcity of adequate training data...

master thesis 2023

document

Automatic Speech Recognition for Air Traffic Control Using Open Data

Lubberding, Jari (author)

Air Traffic Control (ATC) is tasked with ensuring safe separation between aircraft in a given Controlled Traffic Region (CTR). To achieve this an Air Traffic Controller (ATCo) verbally gives clearances using over the air communication. These clearances are kept track of by the ATCo using so-called ‘flight-strips’, which in modern systems are...

master thesis 2023

document

Mitigating Regional Accent Bias in ASR Systems

Li, Zirui (author)

End-to-end Automatic Speech Recognition (ASR) systems improved drastically in recent years and they work extremely well on many large datasets. However, research shows that these models failed to capture the variability in speech production and have biases against the variant caused by the regional accented speech. Moreover, ASR research on...

master thesis 2023

document

Towards inclusive automatic speech recognition

Feng, S. (author), Halpern, B.M. (author), Kudina, O. (author), Scharenborg, O.E. (author)

Practice and recent evidence show that state-of-the-art (SotA) automatic speech recognition (ASR) systems do not perform equally well for all speaker groups. Many factors can cause this bias against different speaker groups. This paper, for the first time, systematically quantifies and finds speech recognition bias against gender, age, regional...

journal article 2023

document

Improving Adaptive Learning Models Using Prosodic Speech Features

Wilschut, Thomas (author), Sense, Florian (author), Scharenborg, O.E. (author), van Rijn, Hedderik (author)

Cognitive models of memory retrieval aim to describe human learning and forgetting over time. Such models have been successfully applied in digital systems that aid in memorizing information by adapting to the needs of individual learners. The memory models used in these systems typically measure the accuracy and latency of typed retrieval...

conference paper 2023

document

Bias Mitigation Against Non-native Speakers in Dutch ASR

Zhang, Yixuan (author)

One of the most important problems that needs tackling for wide deployment of Automatic Speech Recognition (ASR) is the bias in ASR, i.e., ASRs tend to generate more accurate predictions for certain speaker groups while making more errors on speech from others. In this thesis, we aim to reduce bias against non-native speakers of Dutch compared...

master thesis 2022

document

Bringing digital scribes into orthopedic consultations: Towards AI-assisted clinical documentation

Magyari, Reka (author)

Clinical documentation takes up 40% of clinicians’ time. To ease the administrative burden of clinicians, digital scribes offer the potential to automate clinical note taking. Digital scribes are intelligent documentation softwares that combine automated speech recognition (ASR) and natural language processing (NLP). Digital scribes transcribe...

master thesis 2022

document

The First Multimodal Information Based Speech Processing (Misp) Challenge: Data, Tasks, Baselines And Results

Chen, Hang (author), Zhou, Hengshun (author), Du, Jun (author), Lee, Chin-Hui (author), Chen, Jingdong (author), Watanabe, Shinji (author), Siniscalchi, Sabato Marco (author), Scharenborg, O.E. (author), Liu, Di-Yuan (author)

In this paper we discuss the rational of the Multi-model Information based Speech Processing (MISP) Challenge, and provide a detailed description of the data recorded, the two evaluation tasks and the corresponding baselines, followed by a summary of submitted systems and evaluation results. The MISP Challenge aims at tack-ling speech processing...

conference paper 2022

document

Low-resource automatic speech recognition and error analyses of oral cancer speech

Halpern, B.M. (author), Feng, S. (author), van Son, Rob (author), van den Brekel, Michiel (author), Scharenborg, O.E. (author)

In this paper, we introduce a new corpus of oral cancer speech and present our study on the automatic recognition and analysis of oral cancer speech. A two-hour English oral cancer speech dataset is collected from YouTube. Formulated as a low-resource oral cancer ASR task, we investigate three acoustic modelling approaches that previously...

journal article 2022

document

How phonotactics affect multilingual and zero-shot asr performance

Feng, S. (author), Żelasko, Piotr (author), Moro-Velázquez, Laureano (author), Abavisani, Ali (author), Hasegawa-Johnson, Mark (author), Scharenborg, O.E. (author), Dehak, Najim (author)

The idea of combining multiple languages’ recordings to train a single automatic speech recognition (ASR) model brings the promise of the emergence of universal speech representation. Recently, a Transformer encoder-decoder model has been shown to leverage multilingual data well in IPA transcriptions of languages presented during training....

conference paper 2021

document

Word recognition in a model of visually grounded speech: An analysis using techniques inspired by human speech processing research

Scholten, J.S.M. (author)

A Visually Grounded Speech model is a neural model which is trained to embed image caption pairs closely together in a common embedding space. As a result, such a model can retrieve semantically related images given a speech caption and vice versa. The purpose of this research is to investigate whether and how a Visually Grounded Speech model...

master thesis 2020

document

Speech technology for unwritten languages

Scharenborg, O.E. (author), Besacier, Laurent (author), Black, Alan W. (author), Hasegawa-Johnson, Mark (author), Metze, Florian (author), Neubig, Graham (author), Stueker, Sebastian (author), Godard, Pierre (author), Mueller, M (author)

Speech technology plays an important role in our everyday life. Among others, speech is used for human-computer interaction, for instance for information retrieval and on-line shopping. In the case of an unwritten language, however, speech technology is unfortunately difficult to create, because it cannot be created by the standard...

journal article 2020

document

Study of the performance of automatic speech recognition systems in speakers with Parkinson’s Disease

Moro-Velazquez, Laureano (author), Cho, JaeJin (author), Watanabe, Shinji (author), Hasegawa-Johnson, Mark A. (author), Scharenborg, O.E. (author), Kim, Heejin (author), Dehak, Najim (author)

Parkinson’s Disease (PD) affects motor capabilities of patients, who in some cases need to use human-computer assistive technologies to regain independence. The objective of this work is to study in detail the differences in error patterns from state-of-the-art Automatic Speech Recognition (ASR) systems on speech from people with and without PD....

conference paper 2019

document

Building an ASR System for Mboshi Using A Cross-language Definition of Acoustic Units Approach

Scharenborg, O.E. (author), Ebel, Patrick (author), Ciannella, Francesco (author), Hasegawa-Johnson, Mark (author), Dehak, Najim (author)

For many languages in the world, not enough (annotated) speech data is available to train an ASR system. Recently, we proposed a cross-language method for training an ASR system using linguistic knowledge and semi-supervised training. Here, we apply this approach to the low-resource language Mboshi. Using an ASR system trained on Dutch, Mboshi...

conference paper 2018

document

Towards Robust Visual Speech Recognition: Automatic Systems for Lip Reading of Dutch

Chitu, A.G. (author)

In the last two decades we witnessed a rapid increase of the computational power governed by Moore's Law. As a side effect, the affordability of cheaper and faster CPUs increased as well. Therefore, many new “smart” devices flooded the market and made informational systems widely spread. The number of users of information systems has also...

doctoral thesis 2010

document

Automatic speech recognition using dynamic Bayesian networks

Van de Lisdonk, R.H.M. (author)

New ideas to improve automatic speech recognition have been proposed that make use of context user information such as gender, age and dialect. To incorporate this information into a speech recognition system a new framework is being developed at the MMI department of the EWI faculty at the Delft University of Technology. This toolkit is called...

master thesis 2009

document

Modelling context in automatic speech recognition

Wiggers, P. (author)

Speech is at the core of human communication. Speaking and listing comes so natural to us that we do not have to think about it at all. The underlying cognitive processes are very rapid and almost completely subconscious. It is hard, if not impossible not to understand speech. For computers on the other hand, recognising speech is a daunting...

doctoral thesis 2008

document

Lip-reading automatons: Multimodal speech recognition

De Boo, M. (author)

Just imagine that you are standing in the concourse of Rotterdam Central Station, and you can speak into a machine to ask it the time of the next train to Amsterdam, and an electronic voice will instantly tell you the answer, including the platform number. The TU Delft Mediamatics department has been collaborating for some years with OVR ...

journal article 2002