D. Gavrila
Please Note
51 records found
1
Robots will increasingly operate near humans that introduce uncertainties in the motion planning problem due to their complex nature. Optimization-based planners typically avoid humans through collision avoidance chance constraints. This allows the planner to optimize performance while guaranteeing probabilistic safety. However, existing real-time methods do not consider the actual probability of collision for the planned trajectory but rather its marginalization, that is, the independent collision probabilities for each planning step and/or dynamic obstacle, resulting in conservative trajectories. To address this issue, we introduce a novel real-time capable method termed Safe Horizon MPC that explicitly constrains the joint probability of collision with all obstacles over the duration of the motion plan. This is achieved by reformulating the chance-constrained planning problem using scenario optimization and predictive control. Out of sampled realizations of human motion, we identify which cases affect the optimization. This allows us to certify the planned trajectory in real-time. Our method is less conservative than state-of-the-art approaches, applicable to arbitrary probability distributions of the obstacles’ trajectories, computationally tractable and scalable. We demonstrate our proposed approach using a mobile robot and an autonomous vehicle in an environment shared with humans.
This paper studies road user trajectory prediction in mixed traffic, i.e. where vehicles and Vulnerable Road Users (VRUs, i.e. pedestrians, cyclists and other riders) closely share a common road space. We investigate if typical prediction components (scene graph representation, scene encoding, waypoint prediction, motion dynamics) should be specific to each road user class. Using the recent VRU-heavy View-of-Delft Prediction (VoD-P) dataset, we study several directions to improve the performance of the state-of-the-art map-based prediction models (PGP, TNT) in urban settings. First, we consider the use of class-specific map representations. Second, we investigate if the weights of different components of the model should be shared or separated by class. Finally, we augment VoD-P training data with automatically extracted trajectories from the 360-degree LiDAR scans by the recording vehicle. This data is made publicly available. We find that pre-training the model on auto-labels and making it class-specific leads to a reduction of up to 22.2%, 20.0%, and 18.2% in minADE (K = 10 samples) for pedestrians, cyclists, and vehicles, respectively.
We present a vehicle system capable of navigating safely and efficiently around Vulnerable Road Users (VRUs), such as pedestrians and cyclists. The system comprises key modules for environment perception, localization and mapping, motion planning, and control, integrated into a prototype vehicle. A key innovation is a motion planner based on Topology-driven Model Predictive Control (T-MPC). The guidance layer generates multiple trajectories in parallel, each representing a distinct strategy for obstacle avoidance or non-passing. The underlying trajectory optimization constrains the joint probability of collision with VRUs under generic uncertainties. To address extraordinary situations ('edge cases') that go beyond the autonomous capabilities - such as construction zones or encounters with emergency responders - the system includes an option for remote human operation, supported by visual and haptic guidance. In simulation, our motion planner outperforms three baseline approaches in terms of safety and efficiency. We also demonstrate the full system in prototype vehicle tests on a closed track, both in autonomous and remotely operated modes.
This letter presents View-of-Delft Prediction, a new dataset for trajectory prediction, to address the lack of on-board trajectory datasets in urban mixed-traffic environments. View-of-Delft Prediction builds on the recently released urban View-of-Delft (VoD) dataset to make it suitable for trajectory prediction. Unique features of this dataset are the challenging road layouts of Delft, with many narrow roads and bridges, and the close proximity between vehicles and Vulnerable Road Users (VRUs). It contains a large proportion of VRUs, with 569 prediction instances for vehicles, 347 for cyclists, and 934 for pedestrians. We additionally provide high-definition map annotations for the VoD dataset to enable state-of-the-art prediction models to be used. We analyse two state-of-the-art trajectory prediction models, PGP and P2T, which originally were developed for vehicle-dominated traffic scenarios, to assess the strengths and weaknesses of current modelling approaches in mixed traffic settings with large numbers of VRUs. Our analysis shows that there is a significant domain gap between the vehicle-dominated nuScenes and VRU-dominated VoD Prediction datasets. The dataset is publicly released for non-commercial research purposes.
Ground robots navigating in complex, dynamic environments must compute collision-free trajectories to avoid obstacles safely and efficiently. Nonconvex optimization is a popular method to compute a trajectory in real-time. However, these methods often converge to locally optimal solutions and frequently switch between different local minima, leading to inefficient and unsafe robot motion. In this work, we propose a novel topology-driven trajectory optimization strategy for dynamic environments that plans multiple distinct evasive trajectories to enhance the robot's behavior and efficiency. A global planner iteratively generates trajectories in distinct homotopy classes. These trajectories are then optimized by local planners working in parallel. While each planner shares the same navigation objectives, they are locally constrained to a specific homotopy class, meaning each local planner attempts a different evasive maneuver. The robot then executes the feasible trajectory with the lowest cost in a receding horizon manner. We demonstrate, on a mobile robot navigating among pedestrians, that our approach leads to faster trajectories than existing planners.
See Further Than CFAR
A Data-Driven Radar Detector Trained by Lidar
In this paper, we address the limitations of traditional constant false alarm rate (CFAR) target detectors in automotive radars, particularly in complex urban environments with multiple objects that appear as extended targets. We propose a data-driven radar target detector exploiting a highly efficient 2D CNN backbone inspired by the computer vision domain. Our approach is distinguished by a unique cross-sensor supervision pipeline, enabling it to learn exclusively from unlabeled synchronized radar and lidar data, thuseliminating the need for costly manual object annotations. Using a novel large-scale, real-life multi-sensor dataset recorded in various driving scenarios, we demonstrate that the proposed detector generates dense, lidar-like point clouds, achieving a lower Chamfer distance to the reference lidar point clouds than CFAR detectors. Overall, it significantly outperforms CFAR baselines detection accuracy.
Early and accurate detection of crossing pedestrians is crucial in automated driving in order to perform timely emergency manoeuvres. However, this is a difficult task in urban scenarios where pedestrians are often occluded (not visible) behind objects, e.g., other parked vehicles. We propose an occlusion aware fusion of stereo camera and radar sensors to address scenarios with crossing pedestrians behind such parked vehicles. Our proposed method adapts both the expected rate and properties of detections in different areas according to the visibility of the sensors. In our experiments on a real-world dataset, we show that the proposed occlusion aware fusion of radar and stereo camera detects the crossing pedestrians on average 0.26 seconds earlier than using the camera alone, and 0.15 seconds earlier than fusing the sensors without occlusion information. Our dataset containing 501 relevant recordings of pedestrians behind vehicles will be publicly available on our website for non-commercial, scientific use.
IntrApose
Monocular Driver 6 DOF Head Pose Estimation Leveraging Camera Intrinsics
We present intrApose, a novel method for continuous 6 DOF head pose estimation from a single camera image without prior detection or landmark localization. We argue that using camera intrinsics alongside the intensity information is essential for accurate pose estimation. The proposed head pose estimation framework is crop-aware and scale-aware, i.e., it keeps poses estimated within image cut-outs consistent with the whole image. It employs a continuous, differentiable rotation representation that simplifies the overall architecture compared to existing methods. Our method is validated on DD-Pose, a challenging real-world in-vehicle driver observation dataset that offers a broad spectrum of poses and occlusion states from naturalistic driving scenarios. In ablation studies we compare rotation and translation errors of intrinsics-aware and-agnostic methods, continuous and discontinuous rotation representations, and data sampling strategies. Experiments show that leveraging camera intrinsics and a continuous rotation representation (SVDO+) results in a balanced mean angular error (BMAE) of 5.8° compared to the intrinsics agnostic baseline with a discontinuous rotation representation (14.8°). Furthermore, training with an unbiased data distribution (most driver measurements are close-to-frontal) improved BMAE on the hard subset (extreme orientations and occlusions) from 15.3° to 9.5°.
Next-generation automotive radars provide elevation data in addition to range-, azimuth- and Doppler velocity. In this experimental study, we apply a state-of-the-art object detector (PointPillars), previously used for LiDAR 3D data, to such 3+1D radar data (where 1D refers to Doppler). In ablation studies, we first explore the benefits of the additional elevation information, together with that of Doppler, radar cross section and temporal accumulation, in the context of multi-class road user detection. We subsequently compare object detection performance on the radar and LiDAR point clouds, object class-wise and as a function of distance. To facilitate our experimental study, we present the novel View-of-Delft (VoD) automotive dataset. It contains 8693 frames of synchronized and calibrated 64-layer LiDAR-, (stereo) camera-, and 3+1D radar-data acquired in complex, urban traffic. It consists of 123106 3D bounding box annotations of both moving and static objects, including 26587 pedestrian, 10800 cyclist and 26949 car labels. Our results show that object detection on 64-layer LiDAR data still outperforms that on 3+1D radar data, but the addition of elevation information and integration of successive radar scans helps close the gap. The VoD dataset is made freely available for scientific benchmarking.
State-of-the-art stixel methods fuse dense stereo disparity and semantic class information, e.g. from a Convolutional Neural Network (CNN), into a compact representation of driveable space, obstacles and background. However, they do not explicitly differentiate instances within the same semantic class. We investigate several ways to augment single-frame stixels with instance information, which can be extracted by a CNN from the RGB image input. As a result, our novel Instance Stixels method efficiently computes stixels that account for boundaries of individual objects, and represents instances as grouped stixels that express connectivity. Experiments on the Cityscapes dataset demonstrate that including instance information into the stixel computation itself, rather than as a post-processing step, increases the segmentation performance (i.e. Intersection over Union and Average Precision). This holds especially for overlapping objects of the same class. Furthermore, we show the superiority of our approach in terms of segmentation performance and computational efficiency compared to combining the separate outputs of Semantic Stixels and a state-of-the-art pixel-level CNN. We achieve processing throughput of 28 frames per second on average for 8 pixel wide stixels on images from the Cityscapes dataset at 1792x784 pixels. Our Instance Stixels software is made freely available for non-commercial research purposes.
Semantic scene completion is the task of jointly estimating 3D geometry and semantics of objects and surfaces within a given extent. This is a particularly challenging task on real-world data that is sparse and occluded. We propose a scene segmentation network based on local Deep Implicit Functions as a novel learning-based method for scene completion. Unlike previous work on scene completion, our method produces a continuous scene representation that is not based on voxelization. We encode raw point clouds into a latent space locally and at multiple spatial resolutions. A global scene completion function is subsequently assembled from the localized function patches. We show that this continuous representation is suitable to encode geometric and semantic properties of extensive outdoor scenes without the need for spatial discretization (thus avoiding the trade-off between level of scene detail and the scene extent that can be covered). We train and evaluate our method on semantically annotated LiDAR scans from the Semantic KITTI dataset. Our experiments verify that our method generates a powerful representation that can be decoded into a dense 3D description of a given scene. The performance of our method surpasses the state of the art on the Semantic KITTI Scene Completion Benchmark in terms of geometric completion intersection-over-union (IoU).
We present a novel method for vehicle-pedestrian path prediction that takes into account the awareness of the driver and the pedestrian towards each other. The method jointly models the paths of vehicle and pedestrian within a single Dynamic Bayesian Network (DBN). In this DBN, sub-graphs model the environment and entity-specific context cues of the vehicle and pedestrian (incl. awareness), which affect their future motion and allow to increase the prediction horizon. These sub-graphs share a latent state which models whether vehicle and pedestrian are on collision course; this accounts for a certain degree of motion coupling. The method was validated with real-world data obtained by onboard vehicle sensing (stereo vision, GNSS and proprioceptive). Data consist of 93 vehicle and pedestrian encounters, spanning various awareness conditions and dynamic characteristics of the participants. In ablation studies, we quantify the benefits of various components of our proposed DBN model for path prediction and collision risk estimation. Results show that at a prediction horizon of 1.5 s, context aware models outperform context-agnostic models in path prediction for scenarios with a dynamics change, while performing similarly otherwise. Results further indicate that driver attention aware models improve collision risk estimation compared to driver-agnostic models.
Non-verbal communication, such as eye contact between drivers and pedestrians, has been regarded as one way to reduce accident risk. So far, studies have assumed rather than objectively measured the occurrence of eye contact. We address this research gap by developing an eye contact detection method and testing it in an indoor experiment with scripted driver–pedestrian interactions at a pedestrian crossing. Thirty participants acted as a pedestrian either standing on an imaginary curb or crossing an imaginary one-lane road in front of a stationary vehicle with an experimenter in the driver's seat. In half of the trials, pedestrians were instructed to make eye contact with the driver; in the other half, they were prohibited from doing so. Both parties’ gaze was recorded using eye trackers. An in-vehicle stereo camera recorded the car's point of view, a head-mounted camera recorded the pedestrian's point of view, and the location of the driver's and pedestrian's eyes was estimated using image recognition. We demonstrate that eye contact can be detected by measuring the angles between the vector joining the estimated location of the driver's and pedestrian's eyes, and the pedestrian's and driver's instantaneous gaze directions, respectively, and identifying whether these angles fall below a threshold of 4°. We achieved 100% correct classification of the trials involving eye contact and those without eye contact, based on measured eye contact duration. The proposed eye contact detection method may be useful for future research into eye contact.