A Generative Neural Network Model for Speech Enhancement
H.S. Kapadia (TU Delft - Electrical Engineering, Mathematics and Computer Science)
W.B. Kleijn – Mentor (TU Delft - Signal Processing Systems)
R. C. Hendriks – Coach (TU Delft - Signal Processing Systems)
Bert de Vries – Coach (GN Hearing)
Anne Hendrikse – Graduation committee member (GN Hearing)
More Info
expand_more
Readers are encouraged to also listen to the audio files for different experiments to understand the observations and inferences made.
https://drive.google.com/drive/folders/1N4q_JkNImB5NTSTtprw8uGZ991j3E1ge?usp=sharingOther than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Listening in noise is a challenging problem that affects the hearing capability of not only normal hearing but especially hearing impaired people. Since the last four decades, enhancing the quality and intelligibility of noise corrupted speech by reducing the effect of noise has been addressed using statistical signal processing techniques as well as neural networks. However, the fundamental idea behind implementing these methods is the same, i.e., to achieve the best possible estimate of a single target speech waveform. This thesis explores a different route using generative modeling with deep neural networks where speech is artificially generated by conditioning the model on previously predicted samples and features extracted from noisy speech. The proposed system consists of the U-Net model for enhancing the noisy features and the WaveRNN synthesizer (originally proposed for text-to-speech synthesis) re-designed for synthesizing clean sounding speech from noisy features. Subjective results indicate that speech generated by the proposed system is preferred over listening to noisy speech however, the improvement in intelligibility is not significant.