Leveraging Neural Acoustic Fields for Indoor Localization

None, None

Leveraging Neural Acoustic Fields for Indoor Localization

Master Thesis (2025)

Author(s)

M.L. Jonker (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

K.G. Langendoen – Graduation committee member (TU Delft - Embedded Systems)

G. Lan – Mentor (TU Delft - Embedded Systems)

Nitinder Mohan – Graduation committee member (TU Delft - Networked Systems)

Faculty

Electrical Engineering, Mathematics and Computer Science

Localization Indoor Localisation Acoustics Implicit neural representation Neural acoustic fields Analysis-by-synthesis

To reference this document use:

https://resolver.tudelft.nl/uuid:c2346cdc-2d98-4c7f-b836-fb3c7c66ebf4

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

20-10-2025

Awarding Institution

Delft University of Technology

Programme

['Electrical Engineering | Embedded Systems']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This thesis presents an analysis-by-synthesis approach for single-microphone indoor localization that inverts Neural Acoustic Fields (NAFs) by comparing synthesized and measured room impulse responses. Inspired by Neural Radiance Fields (NeRFs), NAFs model room impulse responses (RIRs) as continuous functions of spatial coordinates, enabling localization through spectral loss minimization over candidate listener positions. To mitigate computational overhead, we introduce Standard Deviation-Weighted Sampling (SDWS), focusing on informative time-frequency bins. Further, we evaluate regularization effects on loss landscapes. Evaluated on SoundSpaces (simulated, binaural) and RAF (real-world, monaural) datasets, the method shows complementary behavior across datasets. While it outperforms direct regression baselines (ResNet-10, NAF-Direct) in sparse-data regimes on RAF, achieving up to 32% lower mean localization error (on RAF at 10% data), performance is lower on SoundSpaces, likely due to the high acoustic similarity between different locations in the simulated environments. PSO reduces runtime by 75% over grid search while improving accuracy by 14%, and SDWS cuts computation by 40× with only 22% error increase. The approach demonstrates NAF’s potential for localization but highlights trade-offs between inference time (5-200 s per query) and performance. Future work could extend the method to jointly estimate listener position and orientation, or to incorporate a hybrid search algorithm for more efficient exploration of the loss space.

Files

MSc_Thesis_Mees_Jonker_-_Repos... (pdf)

(pdf | 7.98 Mb)

License info not available