The cocktail party problem

None, None

The cocktail party problem

GSVD-beamformers for speech in reverberant environments

Master Thesis (2018)

Author(s)

D.S. Hulsinga (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

A-J. van der van der Veen – Mentor

R. Heusdens – Graduation committee member

JH Weber – Graduation committee member

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Permutation problem Speech separation Blind source separation Generalized Singular Value Decomposition Cocktail Party Problem

To reference this document use:

https://resolver.tudelft.nl/uuid:60a67ca0-6110-480a-bc10-5f7bd5908029

More Info

expand_more

Publication Year

2018

Language

English

Copyright

Graduation Date

16-02-2018

Awarding Institution

Delft University of Technology

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Hearing aids as a form of audio preprocessing is increasingly common in everyday life. The goal of this thesis is to implement a blind approach to the cocktail party problem and challenge some of the regular assumptions made in literature. We approach the problemas wideband FD-BSS. From this field of research, the common assumption of contineous activity is dropped. Instead a number of users detection is implemented as a preprocessing step and ensure the appropriate number of demixing vectors for each time frequency bin. The validity of the standard mixing model used for STFT’s is challenged by looking at the response of a linear array. Source separation is achieved by demixing vectors based on the GSVD, derived in a model-based approach. While most ermutation solvers offer an a posteriori solution for all users, we looked at finding local solutions for a single user. Combining this with the user identification called the alignment step, we conclude that the permutation problem can be reduced to selecting a demixing vector for each discrete time-frequency instance.
The correlation coefficient proves to be a sufficient metric to couple reconstructions to the original data as it selects most of the active time-frequency bins. In the far-field case, our approach performs in a comparable
but not superior manner. We did find that our method is much more robust against inaccuracies introduced when narrowband channels are assumed but not actually available. This is strongly exemplified by our experiment of a changing DFT-size.
The Frobinius norm was suggested as a measure of distance between the estimate STFT and the orignial signals time frequency domain description but it resulted in counter intuitive results which didn’t correspond with other metrics used in this thesis. It is expected that there are effects induced by changing the size of the STFT which are not accounted for.
Our demixing vectors achieve comparable inteligibility, measured by STOI, as the compared techniques and it is more robust against smaller sample sizes than the theoretically SINR optimal MVDR.

Files

Thesis_dhulsinga.pdf

(pdf | 2.62 Mb)

License info not available