Holger Caesar
Please Note
31 records found
1
The Autonomous Surface Vehicle (ASV) market is expected to double by 2030, rapidly transforming maritime logistics through faster deliveries, lower costs, reduced risks from human error, and the potential to save human lives. ASVs depend on robust object detection models to ensure safe navigation. However, existing models are often susceptible to natural corruptions such as blur, noise, adverse weather, and occlusions-risks to perception robustness further intensified by the lack of domain-specific robustness benchmarks. To fill this gap, we propose the first waterborne-focused robustness benchmark, incorporating 25 synthetic corruptions (15 adapted from ImageNet-C plus 10 novel ones for ASVs) across five severity levels. We also incorporate mixed corruptions to capture real-world complexity. Building on three public waterborne datasets (SeaShips, SMD, SSAVE), we create SeaShips-C, SMD-C, and SSAVE-C, each augmented with our corruption suite. A comprehensive robustness evaluation is conducted on multiple sizes of YOLOv8, SSD, NanoDet-Plus, and RT-DETR, revealing critical vulnerabilities: e.g., YOLOv8n's mAP50 drops by 43.0 % under contrast corruption on SeaShips-C, reaching a 59.5 % decline when combined with raindrops. Larger variants (e.g., YOLOv8x) exhibit greater robustness, offering insights for safer deployments. Aligned with ISO/IEC TR 5469 and IEC 61508, our benchmark supports pre-deployment verification. By identifying risk-prone conditions, practitioners can apply targeted mitigation strategies, such as data augmentation and human oversight. To promote further research and support industrial practice, we provide open access to all benchmark datasets and code-which can also serve as a data augmentation resource to enhance model training.
Millimeter-wave (mmWave) radars are critical for autonomous vehicles' perception tasks, offering reliable performance in adverse weather conditions. However, their application is often hindered by insufficient spatial resolution for detailed semantic scene interpretation. Traditional super-resolution methods derived from optical imaging fail to accommodate the unique properties of radar signals. Addressing this, our study redefines radar imaging superresolution as a one-dimensional (1D) signal super-resolution spectra estimation problem, leveraging domain-specific insights to innovate data normalization and introduce a domain-informed signal-tonoise ratio (SNR)-guided loss function. Our custom deep learning network, tailored for automotive radar imaging, achieves substantial improvements in parameter efficiency, and inference speed while enhancing image quality and resolution. Comprehensive tests demonstrate that our SR-SPECNet establishes a new standard for high-resolution radar range-azimuth imaging, surpassing previous methods. Source code and new radar dataset will be made publicly available at https://github.com/ruxinzh/SR DOA.
NeuroNCAP
Photorealistic Closed-Loop Safety Testing for Autonomous Driving
ECCV 2024 W-CODA
1st Workshop on Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving
...
In this paper, we present details of the 1st W-CODA workshop, held in
conjunction with the ECCV 2024. W-CODA aims to explore next-generation
solutions for autonomous driving corner cases, empowered by
state-of-the-art multimodal perception and comprehension techniques. 5
Speakers from both academia and industry are invited to share their
latest progress and opinions. We collect research papers and hold a
dual-track challenge, including both corner case scene understanding and
generation. As the pioneering effort, we will continuously bridge the
gap between frontier autonomous driving techniques and fully
intelligent, reliable self-driving agents robust towards corner cases.
OpenPSG
Open-Set Panoptic Scene Graph Generation via Large Multimodal Models
We present a vehicle system capable of navigating safely and efficiently around Vulnerable Road Users (VRUs), such as pedestrians and cyclists. The system comprises key modules for environment perception, localization and mapping, motion planning, and control, integrated into a prototype vehicle. A key innovation is a motion planner based on Topology-driven Model Predictive Control (T-MPC). The guidance layer generates multiple trajectories in parallel, each representing a distinct strategy for obstacle avoidance or non-passing. The underlying trajectory optimization constrains the joint probability of collision with VRUs under generic uncertainties. To address extraordinary situations ('edge cases') that go beyond the autonomous capabilities - such as construction zones or encounters with emergency responders - the system includes an option for remote human operation, supported by visual and haptic guidance. In simulation, our motion planner outperforms three baseline approaches in terms of safety and efficiency. We also demonstrate the full system in prototype vehicle tests on a closed track, both in autonomous and remotely operated modes.
Mobility Futures
Four scenarios for the Dutch mobility system in 2050
...
VLPrompt-PSG
Vision-Language Prompting for Panoptic Scene Graph Generation
Panoptic scene graph generation (PSG) aims at achieving a comprehensive image understanding by simultaneously segmenting objects and predicting relations among objects. However, the long-tail problem among relations leads to unsatisfactory results in real-world applications. Prior methods predominantly rely on vision information or utilize limited language information, such as object or relation names, thereby overlooking the utility of language information. Leveraging the recent progress in Large Language Models (LLMs), we propose to use language information to assist relation prediction, particularly for rare relations. To this end, we propose the Vision-Language Prompting (VLPrompt) model, which acquires vision information from images and language information from LLMs. Then, through a prompter network based on attention mechanism, it achieves precise relation prediction. Our extensive experiments show that VLPrompt significantly outperforms previous state-of-the-art methods on the PSG dataset, proving the effectiveness of incorporating language information and alleviating the long-tail problem of relations. Code is available at https://github.com/franciszzj/VLPrompt.