Searched for: subject%3A%22Automatic%255C+Speech%255C+Recognition%22
(1 - 20 of 20)
document
Patel, T.B. (author), Scharenborg, O.E. (author)
Children’s Speech Recognition (CSR) is a challenging task due to the high variability in children’s speech patterns and limited amount of available annotated children’s speech data. We aim to improve CSR in the often-occurring scenario that no children’s speech data is available for training the Automatic Speech Recognition (ASR) systems....
journal article 2024
document
van Doorn, Jan Laurenszoon (author)
The application of automatic speech recognition in the air traffic control domain has been researched extensively. However, its primary application remains in the training and simulation of air traffic controllers. This is due to the insufficient performance of automatic speech recognition in specific environments, such as air traffic control,...
master thesis 2023
document
Lin, Chaufang (author)
Whispering, characterized by its soft, breathy, and hushed qualities, serves as a distinct form of speech commonly employed for private communication and can also occur in cases of pathological speech. The acoustic characteristics of whispered speech differ substantially from normally phonated speech and the scarcity of adequate training data...
master thesis 2023
document
Lubberding, Jari (author)
Air Traffic Control (ATC) is tasked with ensuring safe separation between aircraft in a given Controlled Traffic Region (CTR). To achieve this an Air Traffic Controller (ATCo) verbally gives clearances using over the air communication. These clearances are kept track of by the ATCo using so-called ‘flight-strips’, which in modern systems are...
master thesis 2023
document
Li, Zirui (author)
End-to-end Automatic Speech Recognition (ASR) systems improved drastically in recent years and they work extremely well on many large datasets. However, research shows that these models failed to capture the variability in speech production and have biases against the variant caused by the regional accented speech. Moreover, ASR research on...
master thesis 2023
document
Feng, S. (author), Halpern, B.M. (author), Kudina, O. (author), Scharenborg, O.E. (author)
Practice and recent evidence show that state-of-the-art (SotA) automatic speech recognition (ASR) systems do not perform equally well for all speaker groups. Many factors can cause this bias against different speaker groups. This paper, for the first time, systematically quantifies and finds speech recognition bias against gender, age, regional...
journal article 2023
document
Wilschut, Thomas (author), Sense, Florian (author), Scharenborg, O.E. (author), van Rijn, Hedderik (author)
Cognitive models of memory retrieval aim to describe human learning and forgetting over time. Such models have been successfully applied in digital systems that aid in memorizing information by adapting to the needs of individual learners. The memory models used in these systems typically measure the accuracy and latency of typed retrieval...
conference paper 2023
document
Zhang, Yixuan (author)
One of the most important problems that needs tackling for wide deployment of Automatic Speech Recognition (ASR) is the bias in ASR, i.e., ASRs tend to generate more accurate predictions for certain speaker groups while making more errors on speech from others. In this thesis, we aim to reduce bias against non-native speakers of Dutch compared...
master thesis 2022
document
Magyari, Reka (author)
Clinical documentation takes up 40% of clinicians’ time. To ease the administrative burden of clinicians, digital scribes offer the potential to automate clinical note taking. Digital scribes are intelligent documentation softwares that combine automated speech recognition (ASR) and natural language processing (NLP). Digital scribes transcribe...
master thesis 2022
document
Chen, Hang (author), Zhou, Hengshun (author), Du, Jun (author), Lee, Chin-Hui (author), Chen, Jingdong (author), Watanabe, Shinji (author), Siniscalchi, Sabato Marco (author), Scharenborg, O.E. (author), Liu, Di-Yuan (author)
In this paper we discuss the rational of the Multi-model Information based Speech Processing (MISP) Challenge, and provide a detailed description of the data recorded, the two evaluation tasks and the corresponding baselines, followed by a summary of submitted systems and evaluation results. The MISP Challenge aims at tack-ling speech processing...
conference paper 2022
document
Halpern, B.M. (author), Feng, S. (author), van Son, Rob (author), van den Brekel, Michiel (author), Scharenborg, O.E. (author)
In this paper, we introduce a new corpus of oral cancer speech and present our study on the automatic recognition and analysis of oral cancer speech. A two-hour English oral cancer speech dataset is collected from YouTube. Formulated as a low-resource oral cancer ASR task, we investigate three acoustic modelling approaches that previously...
journal article 2022
document
Feng, S. (author), Żelasko, Piotr (author), Moro-Velázquez, Laureano (author), Abavisani, Ali (author), Hasegawa-Johnson, Mark (author), Scharenborg, O.E. (author), Dehak, Najim (author)
The idea of combining multiple languages’ recordings to train a single automatic speech recognition (ASR) model brings the promise of the emergence of universal speech representation. Recently, a Transformer encoder-decoder model has been shown to leverage multilingual data well in IPA transcriptions of languages presented during training....
conference paper 2021
document
Scholten, J.S.M. (author)
A Visually Grounded Speech model is a neural model which is trained to embed image caption pairs closely together in a common embedding space. As a result, such a model can retrieve semantically related images given a speech caption and vice versa. The purpose of this research is to investigate whether and how a Visually Grounded Speech model...
master thesis 2020
document
Scharenborg, O.E. (author), Besacier, Laurent (author), Black, Alan W. (author), Hasegawa-Johnson, Mark (author), Metze, Florian (author), Neubig, Graham (author), Stueker, Sebastian (author), Godard, Pierre (author), Mueller, M (author)
Speech technology plays an important role in our everyday life. Among others, speech is used for human-computer interaction, for instance for information retrieval and on-line shopping. In the case of an unwritten language, however, speech technology is unfortunately difficult to create, because it cannot be created by the standard...
journal article 2020
document
Moro-Velazquez, Laureano (author), Cho, JaeJin (author), Watanabe, Shinji (author), Hasegawa-Johnson, Mark A. (author), Scharenborg, O.E. (author), Kim, Heejin (author), Dehak, Najim (author)
Parkinson’s Disease (PD) affects motor capabilities of patients, who in some cases need to use human-computer assistive technologies to regain independence. The objective of this work is to study in detail the differences in error patterns from state-of-the-art Automatic Speech Recognition (ASR) systems on speech from people with and without PD....
conference paper 2019
document
Scharenborg, O.E. (author), Ebel, Patrick (author), Ciannella, Francesco (author), Hasegawa-Johnson, Mark (author), Dehak, Najim (author)
For many languages in the world, not enough (annotated) speech data is available to train an ASR system. Recently, we proposed a cross-language method for training an ASR system using linguistic knowledge and semi-supervised training. Here, we apply this approach to the low-resource language Mboshi. Using an ASR system trained on Dutch, Mboshi...
conference paper 2018
document
Chitu, A.G. (author)
In the last two decades we witnessed a rapid increase of the computational power governed by Moore's Law. As a side effect, the affordability of cheaper and faster CPUs increased as well. Therefore, many new “smart” devices flooded the market and made informational systems widely spread. The number of users of information systems has also...
doctoral thesis 2010
document
Van de Lisdonk, R.H.M. (author)
New ideas to improve automatic speech recognition have been proposed that make use of context user information such as gender, age and dialect. To incorporate this information into a speech recognition system a new framework is being developed at the MMI department of the EWI faculty at the Delft University of Technology. This toolkit is called...
master thesis 2009
document
Wiggers, P. (author)
Speech is at the core of human communication. Speaking and listing comes so natural to us that we do not have to think about it at all. The underlying cognitive processes are very rapid and almost completely subconscious. It is hard, if not impossible not to understand speech. For computers on the other hand, recognising speech is a daunting...
doctoral thesis 2008
document
De Boo, M. (author)
Just imagine that you are standing in the concourse of Rotterdam Central Station, and you can speak into a machine to ask it the time of the next train to Amsterdam, and an electronic voice will instantly tell you the answer, including the platform number. The TU Delft Mediamatics department has been collaborating for some years with OVR ...
journal article 2002
Searched for: subject%3A%22Automatic%255C+Speech%255C+Recognition%22
(1 - 20 of 20)