Unsupervised Domain Adaptation for Multi-Modal 3D Object Detection under Asymmetric Sensor Degradation

None, None

Unsupervised Domain Adaptation for Multi-Modal 3D Object Detection under Asymmetric Sensor Degradation

Master Thesis (2026)

Author(s)

M.D. Yang (TU Delft - Mechanical Engineering)

Contributor(s)

S. Wang – Mentor (TU Delft - Mechanical Engineering)

J.F.P. Kooij – Mentor (TU Delft - Mechanical Engineering)

Faculty

Mechanical Engineering

Knowledge Distillation 3D Object Detection Unsupervised Domain Adaptation Pseudo-labeling Multi-Modal 3D Object Detection

To reference this document use

https://resolver.tudelft.nl/uuid:e95f5b3f-6088-47e7-b210-f9686888a5ab

More Info

expand_more

Publication Year

2026

Language

English

Graduation Date

19-05-2026

Awarding Institution

Delft University of Technology

Programme

Mechanical Engineering, Biomechanical Design - BioRobotics

Faculty

Mechanical Engineering

Downloads counter

17

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Multi-modal 3D object detectors achieve state-of-the-art performance but remain notoriously brittle to asymmetric sensor degradation, such as when LiDAR point clouds become sparse in new environments. In this paper, we investigate unsupervised cross-modal adaptation to rescue a degraded sensor using an unaffected reference modality, without requiring target-domain labels. Using UniBEV on the nuScenes dataset, we simulate severe degradation by reducing LiDAR resolution from 32 to 8 beams. We systematically compare two leading adaptation paradigms anchored by the reliable camera stream: output-level camera pseudo-labeling and feature-level cross-modal mapping via a Bird's-Eye-View (BEV) Attention U-Net. Our experiments reveal a compelling insight: while feature mapping successfully aligns coarse spatial structures (improving LiDAR-only mAP by 5.6%), it fails to preserve fine-grained localization metrics. In contrast, simple confidence-filtered pseudo-labeling provides a significantly stronger recovery, yielding a 13.1% mAP improvement. Ultimately, our findings suggest that basic feature-level alignment may be insufficient to restore fine-grained 3D detection under severe spatial degradation, indicating that direct output-level supervision can be a more effective and reliable strategy for cross-modal adaptation in this regime.

Files

Mdyang-mscthesis-2026-final-wi... (pdf)

(pdf | 44.7 Mb)

License info not available