Uncertainty-Encoded Multi-Modal Fusion for Robust Object Detection in Autonomous Driving

None, None; None, None; None, None; None, None; None, None

Uncertainty-Encoded Multi-Modal Fusion for Robust Object Detection in Autonomous Driving

Conference Paper (2023)

Author(s)

Yang Lou (City University of Hong Kong)

Qun Song (TU Delft - Embedded Systems)

Qian Xu (City University of Hong Kong)

Rui Tan (Nanyang Technological University)

Jianping Wang (City University of Hong Kong, TU Delft - Microwave Sensing, Signals & Systems)

Research Group

Embedded Systems

Copyright

DOI related publication

https://doi.org/10.3233/FAIA230441

To reference this document use:

https://resolver.tudelft.nl/uuid:532af9e5-dc99-4cd3-9c96-c5a10f809b03

More Info

expand_more

Publication Year

2023

Language

English

Copyright

Research Group

Embedded Systems

Pages (from-to)

1593-1600

ISBN (electronic)

9781643684369

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Multi-modal fusion has shown initial promising results for object detection of autonomous driving perception. However, many existing fusion schemes do not consider the quality of each fusion input and may suffer from adverse conditions on one or more sensors. While predictive uncertainty has been applied to characterize single-modal object detection performance at run time, incorporating uncertainties into the multi-modal fusion still lacks effective solutions due primarily to the uncertainty's cross-modal incomparability and distinct sensitivities to various adverse conditions. To fill this gap, this paper proposes Uncertainty-Encoded Mixture-of-Experts (UMoE) that explicitly incorporates single-modal uncertainties into LiDAR-camera fusion. UMoE uses individual expert network to process each sensor's detection result together with encoded uncertainty. Then, the expert networks' outputs are analyzed by a gating network to determine the fusion weights. The proposed UMoE module can be integrated into any proposal fusion pipeline. Evaluation shows that UMoE achieves a maximum of 10.67%, 3.17%, and 5.40% performance gain compared with the state-of-the-art proposal-level multi-modal object detectors under extreme weather, adversarial, and blinding attack scenarios.

Files

FAIA_372_FAIA230441.pdf

(pdf | 1.62 Mb)