A Data Augmentation Pipeline: Leveraging Controllable Diffusion Models and Automotive Simulation Software

Master Thesis (2024)
Author(s)

J.S. van Leuven (TU Delft - Mechanical Engineering)

Contributor(s)

Son Tong – Mentor

J. Kober – Mentor (TU Delft - Learning & Autonomous Control)

Holger Caesar – Mentor (TU Delft - Intelligent Vehicles)

Faculty
Mechanical Engineering
More Info
expand_more
Publication Year
2024
Language
English
Graduation Date
26-07-2024
Awarding Institution
Delft University of Technology
Programme
['Mechanical Engineering | Vehicle Engineering | Cognitive Robotics']
Faculty
Mechanical Engineering
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Training models for autonomous vehicles (AVs) necessitates substantial volumes of high-quality data due to the strong correlation between dataset size and model performance. However, acquiring such datasets is labor-intensive and expensive, requiring significant resources for collection and labeling. To optimize the utility of available data, augmenting the dataset or generating synthetic data presents a cost-effective and efficient solution. Traditional methods that operate within the RGB domain frequently overlook crucial information, such as object frequency, scene composition, and agent trajectories. To address these limitations, a pipeline employing controllable diffusion models and vehicle simulation software is proposed. This approach involves loading collected data into a physics-based simulator, which allows for augmentation beyond the pixel space into the structural space. The augmented simulated data is subsequently transformed back into the photorealistic domain using generative artificial intelligence. This process generates high-fidelity synthetic data, thereby enabling models to train effectively on an expanded and varied dataset, enhancing robustness through the increased variation. The proposed method is evaluated in both image and video domains to assess its effectiveness.

Files

Thesis_v20.pdf
(pdf | 17.8 Mb)
License info not available