Unsupervised Domain Adaptation for Multi-Modal 3D Object Detection under Asymmetric Sensor Degradation

Master Thesis (2026)
Author(s)

M.D. Yang (TU Delft - Mechanical Engineering)

Contributor(s)

S. Wang – Mentor (TU Delft - Mechanical Engineering)

J.F.P. Kooij – Mentor (TU Delft - Mechanical Engineering)

Faculty
Mechanical Engineering
More Info
expand_more
Publication Year
2026
Language
English
Graduation Date
19-05-2026
Awarding Institution
Delft University of Technology
Programme
Mechanical Engineering, Biomechanical Design - BioRobotics
Faculty
Mechanical Engineering
Downloads counter
17
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Multi-modal 3D object detectors achieve state-of-the-art performance but remain notoriously brittle to asymmetric sensor degradation, such as when LiDAR point clouds become sparse in new environments. In this paper, we investigate unsupervised cross-modal adaptation to rescue a degraded sensor using an unaffected reference modality, without requiring target-domain labels. Using UniBEV on the nuScenes dataset, we simulate severe degradation by reducing LiDAR resolution from 32 to 8 beams. We systematically compare two leading adaptation paradigms anchored by the reliable camera stream: output-level camera pseudo-labeling and feature-level cross-modal mapping via a Bird's-Eye-View (BEV) Attention U-Net. Our experiments reveal a compelling insight: while feature mapping successfully aligns coarse spatial structures (improving LiDAR-only mAP by 5.6%), it fails to preserve fine-grained localization metrics. In contrast, simple confidence-filtered pseudo-labeling provides a significantly stronger recovery, yielding a 13.1% mAP improvement. Ultimately, our findings suggest that basic feature-level alignment may be insufficient to restore fine-grained 3D detection under severe spatial degradation, indicating that direct output-level supervision can be a more effective and reliable strategy for cross-modal adaptation in this regime.

Files

License info not available