DRIFT: Dual-Representation Inter-Fusion Transformer for Automated Driving Perception with 4D Radar Point Clouds

None, None

DRIFT: Dual-Representation Inter-Fusion Transformer for Automated Driving Perception with 4D Radar Point Clouds

Master Thesis (2025)

Author(s)

S. Pei (TU Delft - Mechanical Engineering)

Contributor(s)

A. Palffy – Mentor (Perciv AI)

D.M. Gavrila – Mentor (TU Delft - Intelligent Vehicles)

H. Caesar – Graduation committee member (TU Delft - Intelligent Vehicles)

Francesco Fioranelli – Graduation committee member (TU Delft - Microwave Sensing, Signals & Systems)

Faculty

Mechanical Engineering

Transformer 3D Object Detection Automated Driving Systems 4D Radar Free-space Segmentation Machine Perception

To reference this document use:

https://resolver.tudelft.nl/uuid:c1762d0a-f4dd-4670-9e51-4f09909dca7b

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

30-07-2025

Awarding Institution

Delft University of Technology

Programme

['Mechanical Engineering | Vehicle Engineering | Cognitive Robotics']

Faculty

Mechanical Engineering

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Radar (Radio Detection and Ranging) sensors are cost-efficient and robust under adverse weather conditions, making them an attractive component in modern automated driving perception systems, but they provide significantly sparser information about the environment than camera or LiDAR sensors. Thus, to fully exploit radars in perception solutions, it is crucial to exploit not only local but also global contextual information of the scene. However, existing 4D radar models often struggle to fully exploit both types of information, resulting in suboptimal performance. This paper proposes DRIFT, a dual-representation model that effectively captures and fuses both local and global contexts through a dual-path architecture. The model incorporates a point path to aggregate fine-grained local features and a pillar path to encode coarse-grained global features. These two parallel paths are inter-fused via novel feature-sharing layers at multiple stages, enabling full utilization of both representations. DRIFT is evaluated on the widely used View-of-Delft (VoD) dataset and an internal dataset, demonstrating its state-of-the-art performance across multiple tasks, including object detection and free-road segmentation. Notably, DRIFT achieves a mean average precision (mAP) of 52.6% (compared to 45.4% from the CenterPoint baseline) on the VoD dataset, surpassing existing methods.

Files

Master_Thesis_Siqi_1607.pdf

(pdf | 0 Mb)

License info not available

File under embargo until 30-07-2027