DPFT: Dual Perspective Fusion Transformer for Camera-Radar-Based Object Detection

Journal Article (2025)
Author(s)

F. Fent (Technische Universität München)

A. Palffy (TU Delft - Microwave Sensing, Signals & Systems)

H. Caesar (TU Delft - Intelligent Vehicles)

Microwave Sensing, Signals & Systems
DOI related publication
https://doi.org/10.1109/TIV.2024.3507538
More Info
expand_more
Publication Year
2025
Language
English
Microwave Sensing, Signals & Systems
Issue number
11
Volume number
10
Pages (from-to)
4929-4941
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The perception of autonomous vehicles has to be efficient, robust, and cost-effective. However, cameras are not robust against severe weather conditions, lidar sensors are expensive, and the performance of radar-based perception is still inferior to the others. Camera-radar fusion methods have been proposed to address this issue, but these are constrained by the typical sparsity of radar point clouds and often designed for radars without elevation information. We propose a novel camera-radar fusion approach called Dual Perspective Fusion Transformer (DPFT), designed to overcome these limitations. Our method leverages lower-level radar data (the radar cube) instead of the processed point clouds to preserve as much information as possible and employs projections in both the camera and ground planes to effectively use radars with elevation information and simplify the fusion with camera data. As a result, DPFT has demonstrated state-of-the-art performance on the K-Radar dataset while showing remarkable robustness against adverse weather conditions and maintaining a low inference time.