On speech enhancement in very low SNRs for smart speakers

Master Thesis (2018)
Author(s)

K.A. Sachos (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

R. Heusdens – Mentor

Martin Bo Møller – Mentor

Pablo Martinez-Nuevo – Mentor

Jesper Kjaer Nielsen – Mentor

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2018 Kostas Sachos
More Info
expand_more
Publication Year
2018
Language
English
Copyright
© 2018 Kostas Sachos
Graduation Date
19-10-2018
Awarding Institution
Delft University of Technology
Programme
['Electrical Engineering']
Sponsors
Bang and Olufsen A/S
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Human interaction with a smart speaker involves often distant automatic speech recognition (ASR). However, ASR is a rather cumbersome task at significantly high levels of noise. Most of commercial smart speakers in order to achieve high ASR accuracy they tend to reduce the playback signal once the preset keyword is detected. In an effort to dispose this function from the smart speaker, in this thesis a speech enhancement technique is considered in the front-end of the ASR system aiming at the suppression of the dominant noise component in the degraded speech signal. Having a priori knowledge on the playback signal renders adaptive filtering a well-suited speech technique. Therefore, the class of least mean squares (LMS) algorithms is studied and assessed. Among other techniques of this class the transform domain LMS (TDLMS), due to its inherent signal decorrelation properties, is shown to achieve the best performance in terms of noise suppression and improved speech intelligibility as well as word error rate. The results of this study correspond to a set of simulation incorporating real impulse responses measured in both an anechoic and a reverberant environment.

Files

License info not available