The cocktail party problem

GSVD-beamformers for speech in reverberant environments

Master Thesis (2018)
Author(s)

D.S. Hulsinga (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

AJ van der Veen – Mentor

R Heusdens – Graduation committee member

JH Weber – Graduation committee member

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2018 Derk-Jan Hulsinga
More Info
expand_more
Publication Year
2018
Language
English
Copyright
© 2018 Derk-Jan Hulsinga
Graduation Date
16-02-2018
Awarding Institution
Delft University of Technology
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Hearing aids as a form of audio preprocessing is increasingly common in everyday life. The goal of this thesis is to implement a blind approach to the cocktail party problem and challenge some of the regular assumptions made in literature. We approach the problemas wideband FD-BSS. From this field of research, the common assumption of contineous activity is dropped. Instead a number of users detection is implemented as a preprocessing step and ensure the appropriate number of demixing vectors for each time frequency bin. The validity of the standard mixing model used for STFT’s is challenged by looking at the response of a linear array. Source separation is achieved by demixing vectors based on the GSVD, derived in a model-based approach. While most ermutation solvers offer an a posteriori solution for all users, we looked at finding local solutions for a single user. Combining this with the user identification called the alignment step, we conclude that the permutation problem can be reduced to selecting a demixing vector for each discrete time-frequency instance.
The correlation coefficient proves to be a sufficient metric to couple reconstructions to the original data as it selects most of the active time-frequency bins. In the far-field case, our approach performs in a comparable
but not superior manner. We did find that our method is much more robust against inaccuracies introduced when narrowband channels are assumed but not actually available. This is strongly exemplified by our experiment of a changing DFT-size.
The Frobinius norm was suggested as a measure of distance between the estimate STFT and the orignial signals time frequency domain description but it resulted in counter intuitive results which didn’t correspond with other metrics used in this thesis. It is expected that there are effects induced by changing the size of the STFT which are not accounted for.
Our demixing vectors achieve comparable inteligibility, measured by STOI, as the compared techniques and it is more robust against smaller sample sizes than the theoretically SINR optimal MVDR.

Files

Thesis_dhulsinga.pdf
(pdf | 2.62 Mb)
License info not available