Leveraging Neural Acoustic Fields for Indoor Localization
M.L. Jonker (TU Delft - Electrical Engineering, Mathematics and Computer Science)
K.G. Langendoen – Graduation committee member (TU Delft - Embedded Systems)
G. Lan – Mentor (TU Delft - Embedded Systems)
Nitinder Mohan – Graduation committee member (TU Delft - Networked Systems)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
This thesis presents an analysis-by-synthesis approach for single-microphone indoor localization that inverts Neural Acoustic Fields (NAFs) by comparing synthesized and measured room impulse responses. Inspired by Neural Radiance Fields (NeRFs), NAFs model room impulse responses (RIRs) as continuous functions of spatial coordinates, enabling localization through spectral loss minimization over candidate listener positions. To mitigate computational overhead, we introduce Standard Deviation-Weighted Sampling (SDWS), focusing on informative time-frequency bins. Further, we evaluate regularization effects on loss landscapes. Evaluated on SoundSpaces (simulated, binaural) and RAF (real-world, monaural) datasets, the method shows complementary behavior across datasets. While it outperforms direct regression baselines (ResNet-10, NAF-Direct) in sparse-data regimes on RAF, achieving up to 32% lower mean localization error (on RAF at 10% data), performance is lower on SoundSpaces, likely due to the high acoustic similarity between different locations in the simulated environments. PSO reduces runtime by 75% over grid search while improving accuracy by 14%, and SDWS cuts computation by 40× with only 22% error increase. The approach demonstrates NAF’s potential for localization but highlights trade-offs between inference time (5-200 s per query) and performance. Future work could extend the method to jointly estimate listener position and orientation, or to incorporate a hybrid search algorithm for more efficient exploration of the loss space.