A Generative Neural Network Model for Speech Enhancement

Master Thesis (2019)
Author(s)

H.S. Kapadia (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

W.B. Kleijn – Mentor (TU Delft - Signal Processing Systems)

R. C. Hendriks – Coach (TU Delft - Signal Processing Systems)

Bert de Vries – Coach (GN Hearing)

Anne Hendrikse – Graduation committee member (GN Hearing)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2019 Husain Kapadia
More Info
expand_more
Publication Year
2019
Language
English
Copyright
© 2019 Husain Kapadia
Graduation Date
20-09-2019
Awarding Institution
Delft University of Technology
Sponsors
None
Related content

Readers are encouraged to also listen to the audio files for different experiments to understand the observations and inferences made.

https://drive.google.com/drive/folders/1N4q_JkNImB5NTSTtprw8uGZ991j3E1ge?usp=sharing
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Listening in noise is a challenging problem that affects the hearing capability of not only normal hearing but especially hearing impaired people. Since the last four decades, enhancing the quality and intelligibility of noise corrupted speech by reducing the effect of noise has been addressed using statistical signal processing techniques as well as neural networks. However, the fundamental idea behind implementing these methods is the same, i.e., to achieve the best possible estimate of a single target speech waveform. This thesis explores a different route using generative modeling with deep neural networks where speech is artificially generated by conditioning the model on previously predicted samples and features extracted from noisy speech. The proposed system consists of the U-Net model for enhancing the noisy features and the WaveRNN synthesizer (originally proposed for text-to-speech synthesis) re-designed for synthesizing clean sounding speech from noisy features. Subjective results indicate that speech generated by the proposed system is preferred over listening to noisy speech however, the improvement in intelligibility is not significant.

Files

MSc_Thesis_A_Generative_Neural... (pdf)
(pdf | 20.7 Mb)
- Embargo expired in 01-09-2020
License info not available