Search results | TU Delft Repositories

document

Distance measures for speech recognition

Hunt, M.J. (author), Lefèbvre, . (author)

This report is concerned with the application of aspects of statistical pattern classification to speech recognition. It presents an extension of linear discriminant analysis to the case where the classes are unknown. This extension provides solutions to the interrelated problems of the design of acoustic representations and spectral distance...

report 1989

document

Lip-reading automatons: Multimodal speech recognition

De Boo, M. (author)

Just imagine that you are standing in the concourse of Rotterdam Central Station, and you can speak into a machine to ask it the time of the next train to Amsterdam, and an electronic voice will instantly tell you the answer, including the platform number. The TU Delft Mediamatics department has been collaborating for some years with OVR ...

journal article 2002

document

An audio-visual corpus for multimodal speech recognition in Dutch language

Wojdel, J. (author), Wiggers, P. (author), Rothkrantz, L.J.M. (author)

This paper describes the gathering and availability of an audio-visual speech corpus for Dutch language. The corpus was prepared with the multi-modal speech recognition in mind and it is currently used in our research on lip-reading and bimodal speech recognition. It contains the prompts used also in the well-established POLYPHONE corpus and...

conference paper 2002

document

Modelling context in automatic speech recognition

Wiggers, P. (author)

Speech is at the core of human communication. Speaking and listing comes so natural to us that we do not have to think about it at all. The underlying cognitive processes are very rapid and almost completely subconscious. It is hard, if not impossible not to understand speech. For computers on the other hand, recognising speech is a daunting...

doctoral thesis 2008

document

Automatic speech recognition using dynamic Bayesian networks

Van de Lisdonk, R.H.M. (author)

New ideas to improve automatic speech recognition have been proposed that make use of context user information such as gender, age and dialect. To incorporate this information into a speech recognition system a new framework is being developed at the MMI department of the EWI faculty at the Delft University of Technology. This toolkit is called...

master thesis 2009

document

Building a visual speech recognizer

Driel, K.F. (author)

This thesis describes how an automatic lip reader was realized. Visual speech recognition is a precondition for more robust speech recognition in general. The development of the software comprised the following steps: gathering of training data, extracting meaningful features from the obtained video material, training the speech recognizer and...

master thesis 2009

document

Speech-based automatic closed caption alignment

Boogaard, J.A. (author)

In the Netherlands, four million people watch television programs with closed captions because they are hearing impaired or non-native speakers. Closed captions contain Dutch speech transcriptions and non-speech sound descriptions and are displayed as subtitles. Due to government obligation, the number of television programs that must be closed...

master thesis 2010

document

Learning the Model Structure of Dynamic Bayesian Networks for Automated Speech Recognition

Harahap, G. (author)

Improving the performance of Automated Speech Recognition system requires incorporating more knowledge in the model of Automated Speech Recognition system. Information such as the context of the conversation and the characteristics of the speaker can make the task of recognizing speech more accurate. The challenge is how this knowledge can be...

master thesis 2010

document

Towards Robust Visual Speech Recognition: Automatic Systems for Lip Reading of Dutch

Chitu, A.G. (author)

In the last two decades we witnessed a rapid increase of the computational power governed by Moore's Law. As a side effect, the affordability of cheaper and faster CPUs increased as well. Therefore, many new “smart” devices flooded the market and made informational systems widely spread. The number of users of information systems has also...

doctoral thesis 2010

document

Mini games for educative speech recognition

Kol, T.R. (author), Renkens, I.M. (author)

Bachelor thesis on an internship in Singapore where minigames were developed for a game company that was developing an educative speech recognition game to teach English pronunciation to Chinese speakers.

bachelor thesis 2011

document

Deriving content-specific measures of room acoustic perception using a binaural, nonlinear auditory model

Van Dorp Schuitman, J. (author), De Vries, D. (author), Lindau, A. (author)

Acousticians generally assess the acoustic qualities of a concert hall or any other room using impulse response-based measures such as the reverberation time, clarity index, and others. These parameters are used to predict perceptual attributes related to the acoustic qualities of the room. Various studies show that these physical measures are...

journal article 2013

document

Inclusive design for post-lingual hearing impaired people: A way to generate order, structure and awareness, during meetings

Ramirez Nates, C. (author)

This is a Design for Interaction (DfI) master project of the Industrial Design Engineering (IDE) Faculty of TUDelft. In collaboration of Oorzaak (a Dutch company) and the Faculty of Humanities of Leiden University, Centre for Linguistics this research has been developed. The main topics are communication between hearing impaired (HI) and hearing...

master thesis 2015

document

Towards Natural Language Understanding using Multimodal Deep Learning

Bos, S. (author)

This thesis describes how multimodal sensor data from a 3D sensor and microphone array can be processed with deep neural networks such that its fusion, the trained neural network, is a) more robust to noise, b) outperforms unimodal recognition and c) enhances unimodal recognition in absence of multimodal data. We built a framework for a complete...

master thesis 2017

document

Limits on Modeling Compensation in Multimodal DNNs for Audio Visual Speech Recognition

Chandrasekharan Nair, Sreejith (author)

Speech is a natural way of communicating that does not require us to develop any new skills in order to be able to interact with electronic devices. With the evolution of technology, speech has become one of the primary means of communication. Speech recognition is a form of multimedia content analysis, where the information carried in a speech...

master thesis 2017

document

Building an ASR System for Mboshi Using A Cross-language Definition of Acoustic Units Approach

Scharenborg, O.E. (author), Ebel, Patrick (author), Ciannella, Francesco (author), Hasegawa-Johnson, Mark (author), Dehak, Najim (author)

For many languages in the world, not enough (annotated) speech data is available to train an ASR system. Recently, we proposed a cross-language method for training an ASR system using linguistic knowledge and semi-supervised training. Here, we apply this approach to the low-resource language Mboshi. Using an ASR system trained on Dutch, Mboshi...

conference paper 2018

document

Study of the performance of automatic speech recognition systems in speakers with Parkinson’s Disease

Moro-Velazquez, Laureano (author), Cho, JaeJin (author), Watanabe, Shinji (author), Hasegawa-Johnson, Mark A. (author), Scharenborg, O.E. (author), Kim, Heejin (author), Dehak, Najim (author)

Parkinson’s Disease (PD) affects motor capabilities of patients, who in some cases need to use human-computer assistive technologies to regain independence. The objective of this work is to study in detail the differences in error patterns from state-of-the-art Automatic Speech Recognition (ASR) systems on speech from people with and without PD....

conference paper 2019

document

Command Recognition on Intermittently-Powered Devices

Schilder, Patrick (author)

The Internet of Things (IoT) is expected to include billions of tiny devices that collect, process, and communicate sensory data. As of now, batteries power these devices. Batteries, however, are large, expensive, and short-lived - even the rechargeable ones wear out in a few years. Therefore, they are not a sustainable powering solution. Tiny...

master thesis 2019

document

Computer-Based Social Anxiety Regulation in Virtual Reality Exposure Therapy

Hartanto, D. (author)

Social anxiety disorder (SAD), commonly referred to social phobia, is one of the most an immense and unreasonable fear of social interaction. Cognitive behaviour therapy (CBT) is the most thoroughly studied nonpharmacologic approach to the treatment of SAD patients. In CBT, patients are gradually, in <i>vivo</i>, exposed to anxiety-provoking...

doctoral thesis 2019

document

Speech technology for unwritten languages

Scharenborg, O.E. (author), Besacier, Laurent (author), Black, Alan W. (author), Hasegawa-Johnson, Mark (author), Metze, Florian (author), Neubig, Graham (author), Stueker, Sebastian (author), Godard, Pierre (author), Mueller, M (author)

Speech technology plays an important role in our everyday life. Among others, speech is used for human-computer interaction, for instance for information retrieval and on-line shopping. In the case of an unwritten language, however, speech technology is unfortunately difficult to create, because it cannot be created by the standard...

journal article 2020

document

That Sounds Familiar: an Analysis of Phonetic Representations Transfer Across Languages

Żelasko, Piotr (author), Moro-Velázquez, Laureano (author), Hasegawa-Johnson, Mark (author), Scharenborg, O.E. (author), Dehak, Najim (author)

Only a handful of the world’s languages are abundant with the resources that enable practical applications of speech processing technologies. One of the methods to overcome this problem is to use the resources existing in other languages to train a multilingual automatic speech recognition (ASR) model, which, intuitively, should learn some...

conference paper 2020

Pages

Pages