D. Gavrila | TU Delft Repository

Free Space Segmentation using Automotive Radar

Conference paper (2025) - M. Hassan, A. Palffy, F. Fioranelli, A. Yarovoy, S. Ravindran, D. Gavrila

A data driven method is proposed to obtain free space segmentation using automotive radar point clouds. It aggregates automotive radar detection points from multiple timestamps, projects them into a Birds-Eye-View grid-based representation, and applies a semantic segmentation Neural Network (NN) to classify each grid for free space segmentation. A lidar based supervision is used to generate the ground truth for training. Moreover, debris objects are manually annotated to enable the NN model to learn to detect these uncommon objects. Experimental results on a proprietary 4D Imaging Radar dataset demonstrate that the proposed method gives improved free space segmentation as compared to other baseline methods. ...

Scenario-based motion planning with bounded probability of collision

Journal article (2025) - Oscar de Groot, Laura Ferranti, Dariu M. Gavrila, Javier Alonso-Mora

Robots will increasingly operate near humans that introduce uncertainties in the motion planning problem due to their complex nature. Optimization-based planners typically avoid humans through collision avoidance chance constraints. This allows the planner to optimize performance while guaranteeing probabilistic safety. However, existing real-time methods do not consider the actual probability of collision for the planned trajectory but rather its marginalization, that is, the independent collision probabilities for each planning step and/or dynamic obstacle, resulting in conservative trajectories. To address this issue, we introduce a novel real-time capable method termed Safe Horizon MPC that explicitly constrains the joint probability of collision with all obstacles over the duration of the motion plan. This is achieved by reformulating the chance-constrained planning problem using scenario optimization and predictive control. Out of sampled realizations of human motion, we identify which cases affect the optimization. This allows us to certify the planned trajectory in real-time. Our method is less conservative than state-of-the-art approaches, applicable to arbitrary probability distributions of the obstacles’ trajectories, computationally tractable and scalable. We demonstrate our proposed approach using a mobile robot and an autonomous vehicle in an environment shared with humans. ...

Camera-and LiDAR-based Person Re-Identification

Conference paper (2025) - S.A. Krebs, D. Gavrila

In this paper, we introduce a novel method for creating appearance embeddings to identify individual persons using an object re-identification (ReID) framework. We present CLFormer (Camera LiDAR Transformer), a transformer-based architecture that incorporates multi-modal data from both camera and LiDAR sensors. We introduce the 3D Cuboid-Inclusive Point Embedding (3D-CIPE), which leverages rich data from LiDAR point clouds and 3D cuboids to add a learnable embedding into the transformer structure. Additionally, through ablation studies, we explore and analyze various strategies for the early and late fusion of multi-modal input data. To evaluate our proposed CLFormer, we reinterpret the nuScenes dataset [1] for ReID purposes and use it for our experiments. Our method demonstrates a significant improvement in performance, outperforming the image-only baseline with an increase of 2.3 in mean Average Precision (mAP). ...

Road User Specific Trajectory Prediction in Mixed Traffic Using Map Data

Journal article (2025) - Hidde J.H. Boekema, Emran Yasser Moustafa, Julian F.P. Kooij, Dariu M. Gavrila

This paper studies road user trajectory prediction in mixed traffic, i.e. where vehicles and Vulnerable Road Users (VRUs, i.e. pedestrians, cyclists and other riders) closely share a common road space. We investigate if typical prediction components (scene graph representation, scene encoding, waypoint prediction, motion dynamics) should be specific to each road user class. Using the recent VRU-heavy View-of-Delft Prediction (VoD-P) dataset, we study several directions to improve the performance of the state-of-the-art map-based prediction models (PGP, TNT) in urban settings. First, we consider the use of class-specific map representations. Second, we investigate if the weights of different components of the model should be shared or separated by class. Finally, we augment VoD-P training data with automatically extracted trajectories from the 360-degree LiDAR scans by the recording vehicle. This data is made publicly available. We find that pre-training the model on auto-labels and making it class-specific leads to a reduction of up to 22.2%, 20.0%, and 18.2% in minADE (K = 10 samples) for pedestrians, cyclists, and vehicles, respectively. ...

A Vehicle System for Navigating Among Vulnerable Road Users Including Remote Operation

Conference paper (2025) - O. De Groot, A. Bertipaglia, F. Tajdari, S. Wang, Z. Xia, M. Zaffar, R. Ensing, M. Garzon, J. Alonso-Mora, H. Caesar, L. Ferranti, R. Happee, H. Boekema, J. F.P. Kooij, G. Papaioannou, B. Shyrokau, D. M. Gavrila, V. Jain, M. Kegl, V. Kotian, T. Lentsch, Y. Lin, C. Messiou, E. Schippers

We present a vehicle system capable of navigating safely and efficiently around Vulnerable Road Users (VRUs), such as pedestrians and cyclists. The system comprises key modules for environment perception, localization and mapping, motion planning, and control, integrated into a prototype vehicle. A key innovation is a motion planner based on Topology-driven Model Predictive Control (T-MPC). The guidance layer generates multiple trajectories in parallel, each representing a distinct strategy for obstacle avoidance or non-passing. The underlying trajectory optimization constrains the joint probability of collision with VRUs under generic uncertainties. To address extraordinary situations ('edge cases') that go beyond the autonomous capabilities - such as construction zones or encounters with emergency responders - the system includes an option for remote human operation, supported by visual and haptic guidance. In simulation, our motion planner outperforms three baseline approaches in terms of safety and efficiency. We also demonstrate the full system in prototype vehicle tests on a closed track, both in autonomous and remotely operated modes. ...

SAM-Maps: Road Map Generation for Automated Vehicles in Urban Areas

Conference paper (2025) - M. P. van Andel, H. J. -H. Boekema, D. M. Gavrila

Automated Vehicles (AVs) rely on up-to-date map information to inform trajectory prediction and planning modules, but these maps are expensive to obtain and update as they are usually annotated by humans. We propose SAM-Maps, a method for automatically generating road maps from aerial images of urban areas that takes advantage of the power of foundation models, requiring no human annotation or additional training to map unseen areas. This method extracts a coarse road graph from the images and then estimates the geometry of the roads from this graph. We evaluate our model on the challenging road layouts of the recent View-of-Delft Prediction dataset by comparing the maps generated using our model to the human-annotated maps, achieving an IoU of 33.3% with our automatic method and an IoU of 56.1% with some human corrections in our method. We also evaluate a trajectory prediction model on our maps to test whether they are sufficiently accurate for downstream tasks. The performance of this model using the map from our automatic method is 37.9% better on the minADE6 metric than not using map data as input. To the best of our knowledge, this is the first method that extracts both the drivable area and road connections of European urban areas from aerial images. The code will be publicly released for research purposes. ...

Topology-Driven Parallel Trajectory Optimization in Dynamic Environments

Journal article (2024) - Oscar De Groot, Laura Ferranti, Dariu M. Gavrila, Javier Alonso-Mora

Ground robots navigating in complex, dynamic environments must compute collision-free trajectories to avoid obstacles safely and efficiently. Nonconvex optimization is a popular method to compute a trajectory in real-time. However, these methods often converge to locally optimal solutions and frequently switch between different local minima, leading to inefficient and unsafe robot motion. In this work, we propose a novel topology-driven trajectory optimization strategy for dynamic environments that plans multiple distinct evasive trajectories to enhance the robot's behavior and efficiency. A global planner iteratively generates trajectories in distinct homotopy classes. These trajectories are then optimized by local planners working in parallel. While each planner shares the same navigation objectives, they are locally constrained to a specific homotopy class, meaning each local planner attempts a different evasive maneuver. The robot then executes the feasible trajectory with the lowest cost in a receding horizon manner. We demonstrate, on a mobile robot navigating among pedestrians, that our approach leads to faster trajectories than existing planners. ...

Multi-Class Trajectory Prediction in Urban Traffic Using the View-of-Delft Prediction Dataset

Journal article (2024) - Hidde J.H. Boekema, Bruno K.W. Martens, Julian F.P. Kooij, Dariu M. Gavrila

This letter presents View-of-Delft Prediction, a new dataset for trajectory prediction, to address the lack of on-board trajectory datasets in urban mixed-traffic environments. View-of-Delft Prediction builds on the recently released urban View-of-Delft (VoD) dataset to make it suitable for trajectory prediction. Unique features of this dataset are the challenging road layouts of Delft, with many narrow roads and bridges, and the close proximity between vehicles and Vulnerable Road Users (VRUs). It contains a large proportion of VRUs, with 569 prediction instances for vehicles, 347 for cyclists, and 934 for pedestrians. We additionally provide high-definition map annotations for the VoD dataset to enable state-of-the-art prediction models to be used. We analyse two state-of-the-art trajectory prediction models, PGP and P2T, which originally were developed for vehicle-dominated traffic scenarios, to assess the strengths and weaknesses of current modelling approaches in mixed traffic settings with large numbers of VRUs. Our analysis shows that there is a significant domain gap between the vehicle-dominated nuScenes and VRU-dominated VoD Prediction datasets. The dataset is publicly released for non-commercial research purposes. ...

A Deep Automotive Radar Detector Using the RaDelft Dataset

Journal article (2024) - I. Roldan Montero, A. Palffy, J.F.P. Kooij, D. Gavrila, F. Fioranelli, Alexander Yarovoy

The detection of multiple extended targets in complex environments using high-resolution automotive radar is considered. A data-driven approach is proposed where unlabeled synchronized lidar data are used as ground truth to train a neural network (NN) with only radar data as input. To this end, the novel, large-scale, real-life, and multisensor RaDelft dataset has been recorded using a demonstrator vehicle in different locations in the city of Delft, The Netherlands. The dataset, as well as the documentation and example code, is publicly available for those researchers in the field of automotive radar or machine perception. The proposed data-driven detector can generate lidar-like point clouds (PCs) using only radar data from a high-resolution system, which preserves the shape and size of extended targets. The results are compared against conventional constant false alarm rate (CFAR) detectors as well as variations of the method to emulate the available approaches in the literature, using the probability of detection, the probability of false alarm, and the Chamfer distance (CD) as performance metrics. Moreover, an ablation study was carried out to assess the impact of Doppler and temporal information on detection performance. The proposed method outperforms different baselines in terms of CD, achieving a reduction of 77% against conventional CFAR detectors and 28% against the modified state-of-the-art deep learning (DL)-based approaches. ...

See Further Than CFAR

A Data-Driven Radar Detector Trained by Lidar

Conference paper (2024) - Ignacio Roldan, Andras Palffy, Julian F.P. Kooij, Dariu M. Gavrila, Francesco Fioranelli, Alexander Yarovoy

In this paper, we address the limitations of traditional constant false alarm rate (CFAR) target detectors in automotive radars, particularly in complex urban environments with multiple objects that appear as extended targets. We propose a data-driven radar target detector exploiting a highly efficient 2D CNN backbone inspired by the computer vision domain. Our approach is distinguished by a unique cross-sensor supervision pipeline, enabling it to learn exclusively from unlabeled synchronized radar and lidar data, thus^eliminating the need for costly manual object annotations. Using a novel large-scale, real-life multi-sensor dataset recorded in various driving scenarios, we demonstrate that the proposed detector generates dense, lidar-like point clouds, achieving a lower Chamfer distance to the reference lidar point clouds than CFAR detectors. Overall, it significantly outperforms CFAR baselines detection accuracy. ...

IntrApose

Monocular Driver 6 DOF Head Pose Estimation Leveraging Camera Intrinsics

Journal article (2023) - Markus Roth, Dariu M. Gavrila

We present intrApose, a novel method for continuous 6 DOF head pose estimation from a single camera image without prior detection or landmark localization. We argue that using camera intrinsics alongside the intensity information is essential for accurate pose estimation. The proposed head pose estimation framework is crop-aware and scale-aware, i.e., it keeps poses estimated within image cut-outs consistent with the whole image. It employs a continuous, differentiable rotation representation that simplifies the overall architecture compared to existing methods. Our method is validated on DD-Pose, a challenging real-world in-vehicle driver observation dataset that offers a broad spectrum of poses and occlusion states from naturalistic driving scenarios. In ablation studies we compare rotation and translation errors of intrinsics-aware and-agnostic methods, continuous and discontinuous rotation representations, and data sampling strategies. Experiments show that leveraging camera intrinsics and a continuous rotation representation (SVDO+) results in a balanced mean angular error (BMAE) of 5.8° compared to the intrinsics agnostic baseline with a discontinuous rotation representation (14.8°). Furthermore, training with an unbiased data distribution (most driver measurements are close-to-frontal) improved BMAE on the hard subset (extreme orientations and occlusions) from 15.3° to 9.5°. ...

Globally Guided Trajectory Planning in Dynamic Environments

Conference paper (2023) - O.M. de Groot, L. Ferranti, D. Gavrila, J. Alonso-Mora

Navigating mobile robots through environments shared with humans is challenging. From the perspective of the robot, humans are dynamic obstacles that must be avoided. These obstacles make the collision-free space nonconvex, which leads to two distinct passing behaviors per obstacle (passing left or right). For local planners, such as receding-horizon trajectory optimization, each behavior presents a local optimum in which the planner can get stuck. This may result in slow or unsafe motion even when a better plan exists. In this work, we identify trajectories for multiple locally optimal driving behaviors, by considering their topology. This identification is made consistent over successive iterations by propagating the topology information. The most suitable high-level trajectory guides a local optimization-based planner, resulting in fast and safe motion plans. We validate the proposed planner on a mobile robot in simulation and real-world experiments. ...

Detecting darting out pedestrians with occlusion aware sensor fusion of radar and stereo camera

Journal article (2023) - Andras Palffy, Julian F.P. Kooij, Dariu M. Gavrila

Early and accurate detection of crossing pedestrians is crucial in automated driving in order to perform timely emergency manoeuvres. However, this is a difficult task in urban scenarios where pedestrians are often occluded (not visible) behind objects, e.g., other parked vehicles. We propose an occlusion aware fusion of stereo camera and radar sensors to address scenarios with crossing pedestrians behind such parked vehicles. Our proposed method adapts both the expected rate and properties of detections in different areas according to the visibility of the sensors. In our experiments on a real-world dataset, we show that the proposed occlusion aware fusion of radar and stereo camera detects the crossing pedestrians on average 0.26 seconds earlier than using the camera alone, and 0.15 seconds earlier than fusing the sensors without occlusion information. Our dataset containing 501 relevant recordings of pedestrians behind vehicles will be publicly available on our website for non-commercial, scientific use. ...

Hidden Gems: 4D Radar Scene Flow Learning Using Cross-Modal Supervision

Conference paper (2023) - Fangqiang Ding, Andras Palffy, Dariu Gavrila, Chris Xiaoxuan Lu

This work proposes a novel approach to 4D radar-based scene flow estimation via cross-modal learning. Our approach is motivated by the co-located sensing redundancy in modern autonomous vehicles. Such redundancy implicitly provides various forms of supervision cues to the radar scene flow estimation. Specifically, we introduce a multi-task model architecture for the identified cross-modal learning problem and propose loss functions to opportunistically engage scene flow estimation using multiple cross-modal constraints for effective model training. Extensive experiments show the state-of-the-art performance of our method and demonstrate the effectiveness of cross-modal super-vised learning to infer more accurate 4D radar scene flow. We also show its usefulness to two subtasks - motion segmentation and ego-motion estimation. Our source code will be available on https://github.com/Toytiny/CMFlow. ...

Fast and Compact Image Segmentation using Instance Stixels

Journal article (2022) - Thomas Hehn, Julian F.P. Kooij, Dariu M. Gavrila

State-of-the-art stixel methods fuse dense stereo disparity and semantic class information, e.g. from a Convolutional Neural Network (CNN), into a compact representation of driveable space, obstacles and background. However, they do not explicitly differentiate instances within the same semantic class. We investigate several ways to augment single-frame stixels with instance information, which can be extracted by a CNN from the RGB image input. As a result, our novel Instance Stixels method efficiently computes stixels that account for boundaries of individual objects, and represents instances as grouped stixels that express connectivity. Experiments on the Cityscapes dataset demonstrate that including instance information into the stixel computation itself, rather than as a post-processing step, increases the segmentation performance (i.e. Intersection over Union and Average Precision). This holds especially for overlapping objects of the same class. Furthermore, we show the superiority of our approach in terms of segmentation performance and computational efficiency compared to combining the separate outputs of Semantic Stixels and a state-of-the-art pixel-level CNN. We achieve processing throughput of 28 frames per second on average for 8 pixel wide stixels on images from the Cityscapes dataset at 1792x784 pixels. Our Instance Stixels software is made freely available for non-commercial research purposes. ...

Semantic Scene Completion using Local Deep Implicit Functions on LiDAR Data

Journal article (2022) - Christoph Rist, David Emmerichs, Markus Enzweiler, Dariu Gavrila

Semantic scene completion is the task of jointly estimating 3D geometry and semantics of objects and surfaces within a given extent. This is a particularly challenging task on real-world data that is sparse and occluded. We propose a scene segmentation network based on local Deep Implicit Functions as a novel learning-based method for scene completion. Unlike previous work on scene completion, our method produces a continuous scene representation that is not based on voxelization. We encode raw point clouds into a latent space locally and at multiple spatial resolutions. A global scene completion function is subsequently assembled from the localized function patches. We show that this continuous representation is suitable to encode geometric and semantic properties of extensive outdoor scenes without the need for spatial discretization (thus avoiding the trade-off between level of scene detail and the scene extent that can be covered). We train and evaluate our method on semantically annotated LiDAR scans from the Semantic KITTI dataset. Our experiments verify that our method generates a powerful representation that can be decoded into a dense 3D description of a given scene. The performance of our method surpasses the state of the art on the Semantic KITTI Scene Completion Benchmark in terms of geometric completion intersection-over-union (IoU). ...

Multi-class Road User Detection with 3+1D Radar in the View-of-Delft Dataset

Journal article (2022) - Andras Palffy, Ewoud Pool, Srimannarayana Baratam, Julian Kooij, Dariu Gavrila

Next-generation automotive radars provide elevation data in addition to range-, azimuth- and Doppler velocity. In this experimental study, we apply a state-of-the-art object detector (PointPillars), previously used for LiDAR 3D data, to such 3+1D radar data (where 1D refers to Doppler). In ablation studies, we first explore the benefits of the additional elevation information, together with that of Doppler, radar cross section and temporal accumulation, in the context of multi-class road user detection. We subsequently compare object detection performance on the radar and LiDAR point clouds, object class-wise and as a function of distance. To facilitate our experimental study, we present the novel View-of-Delft (VoD) automotive dataset. It contains 8693 frames of synchronized and calibrated 64-layer LiDAR-, (stereo) camera-, and 3+1D radar-data acquired in complex, urban traffic. It consists of 123106 3D bounding box annotations of both moving and static objects, including 26587 pedestrian, 10800 cyclist and 26949 car labels. Our results show that object detection on 64-layer LiDAR data still outperforms that on 3+1D radar data, but the addition of elevation information and integration of successive radar scans helps close the gap. The VoD dataset is made freely available for scientific benchmarking. ...

Driver and Pedestrian Mutual Awareness for Path Prediction and Collision Risk Estimation

Journal article (2022) - Markus Roth, Jork Stapel, Riender Happee, Dariu M. Gavrila

We present a novel method for vehicle-pedestrian path prediction that takes into account the awareness of the driver and the pedestrian towards each other. The method jointly models the paths of vehicle and pedestrian within a single Dynamic Bayesian Network (DBN). In this DBN, sub-graphs model the environment and entity-specific context cues of the vehicle and pedestrian (incl. awareness), which affect their future motion and allow to increase the prediction horizon. These sub-graphs share a latent state which models whether vehicle and pedestrian are on collision course; this accounts for a certain degree of motion coupling. The method was validated with real-world data obtained by onboard vehicle sensing (stereo vision, GNSS and proprioceptive). Data consist of 93 vehicle and pedestrian encounters, spanning various awareness conditions and dynamic characteristics of the participants. In ablation studies, we quantify the benefits of various components of our proposed DBN model for path prediction and collision risk estimation. Results show that at a prediction horizon of 1.5 s, context aware models outperform context-agnostic models in path prediction for scenarios with a dynamics change, while performing similarly otherwise. Results further indicate that driver attention aware models improve collision risk estimation compared to driver-agnostic models. ...

Learning to Predict Motion from Raw 3D Object Detections

Conference paper (2022) - C. Neumeyer, Mario Bijelic, D. Gavrila

We show how to design a motion prediction algorithm that works with 3D object detections and map locations. In particular, we obtain object id’s – even though the training data does not contain any object id’s – across multiple time-steps into the future by propagating a Gaussian Mixture of likely object (e.g., vehicle) locations through time.We validate our approach on the nuScenes dataset. First, we find that a motion prediction algorithm without tracking id’s performs as well as motion prediction algorithm with tracking id’s in the training data. Second, the 3D labels of an on-board perception system are inferior (e.g., loss of detections, positional uncertainty) to those generated by offline labelling (automatic labelling pipeline, manual labelling). Even so, we find that a moderate increase in the size of the training data offsets the deterioration in prediction performance (with no additional offline labelling). ...

A Joint Extrinsic Calibration Tool for Radar, Camera and Lidar

Journal article (2021) - Joris Domhof, Julian F.P. Kooij, Dariu M. Gavrila

We address joint extrinsic calibration of lidar, camera and radar sensors. To simplify calibration, we propose a single calibration target design for all three modalities, and implement our approach in an open-source tool with bindings to Robot Operating System (ROS). Our tool features three optimization configurations, namely using error terms for a minimal number of sensor pairs, or using terms for all sensor pairs in combination with loop closure constraints, or by adding terms for structure estimation in a probabilistic model. Apart from relative calibration where relative transformations between sensors are computed, our work also addresses absolute calibration that includes calibration with respect to the mobile robot's body. Two methods are compared to estimate the body reference frame using an external laser scanner, one based on markers and the other based on manual annotation of the laser scan. In the experiments, we evaluate the three configurations for relative calibration. Our results show that using terms for all sensor pairs is most robust, especially for lidar to radar, when minimum five board locations are used. For absolute calibration the median rotation error around the vertical axis reduces from 1. before calibration, to 0.33. using the markers and 0.02. with manual annotations. ...