DRIFT: Dual-Representation Inter-Fusion Transformer for Automated Driving Perception with 4D Radar Point Clouds

Master Thesis (2025)
Author(s)

S. Pei (TU Delft - Mechanical Engineering)

Contributor(s)

A. Palffy – Mentor (Perciv AI)

D.M. Gavrila – Mentor (TU Delft - Intelligent Vehicles)

H. Caesar – Graduation committee member (TU Delft - Intelligent Vehicles)

Francesco Fioranelli – Graduation committee member (TU Delft - Microwave Sensing, Signals & Systems)

Faculty
Mechanical Engineering
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
30-07-2025
Awarding Institution
Delft University of Technology
Programme
['Mechanical Engineering | Vehicle Engineering | Cognitive Robotics']
Faculty
Mechanical Engineering
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Radar (Radio Detection and Ranging) sensors are cost-efficient and robust under adverse weather conditions, making them an attractive component in modern automated driving perception systems, but they provide significantly sparser information about the environment than camera or LiDAR sensors. Thus, to fully exploit radars in perception solutions, it is crucial to exploit not only local but also global contextual information of the scene. However, existing 4D radar models often struggle to fully exploit both types of information, resulting in suboptimal performance. This paper proposes DRIFT, a dual-representation model that effectively captures and fuses both local and global contexts through a dual-path architecture. The model incorporates a point path to aggregate fine-grained local features and a pillar path to encode coarse-grained global features. These two parallel paths are inter-fused via novel feature-sharing layers at multiple stages, enabling full utilization of both representations. DRIFT is evaluated on the widely used View-of-Delft (VoD) dataset and an internal dataset, demonstrating its state-of-the-art performance across multiple tasks, including object detection and free-road segmentation. Notably, DRIFT achieves a mean average precision (mAP) of 52.6% (compared to 45.4% from the CenterPoint baseline) on the VoD dataset, surpassing existing methods.

Files

License info not available
warning

File under embargo until 30-07-2027