Room geometry estimation from stereo recordings using neural networks

None, None

Room geometry estimation from stereo recordings using neural networks

Master Thesis (2020)

Author(s)

G. Bologni (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Richard Heusdens – Mentor (TU Delft - Signal Processing Systems)

Franck Giron – Graduation committee member (Sony)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Code Thesis Neural Network Estimation Geometry Deep Computers Stereo Msc Fully Room Recordings Supervised

To reference this document use:

https://resolver.tudelft.nl/uuid:fbe463ac-902b-45e8-adf7-7b530cdd0ef8

More Info

expand_more

Publication Year

2020

Language

English

Copyright

Graduation Date

01-03-2020

Awarding Institution

Delft University of Technology

Programme

['Electrical Engineering | Circuits and Systems']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Acoustic room geometry estimation is often performed in ad hoc settings, i.e., using multiple microphones and sources distributed around the room, or assuming control over the excitation signals. To facilitate practical applications, we propose a fully convolutional network (FCN) that localizes reflective surfaces under the relaxed assumptions that (i) a compact array of only two microphones is available, (ii) emitter and receivers are not synchronized, and (iii), both the excitation signals and the impulse responses of the enclosures are unknown.
Our FCN is designed to extract spectral and temporal patterns from stereo recordings, aggregate the temporal information over time-frames, and predict the likelihood of virtual sources corresponding to reflective surfaces at specific locations. Whereas most source localization algorithms are limited to direction-of-arrival (DOA) estimation, the proposed method jointly estimates distances and DOAs. Numerical experiments confirm that the network is able to generalize to mismatched microphone array sizes, sensor directivity patterns, or audio signal types, while highlighting front-back ambiguity as a prominent source of uncertainty. When a single reflective surface is present, up to 80% of the sources are detected, while this figure approaches 50% in rectangular rooms.
Further tests on real-world recordings report similar accuracy as with artificially reverberated speech signals, validating the generalization capabilities of the framework.

Files

MSc_thesis_Giovanni_Bologni.pd... (pdf)

(pdf | 4.35 Mb)

- Embargo expired in 31-01-2022

License info not available