Surrogate Reloaded: Fast Testing for Deep Reinforcement Learning

Convolutional Neural Networks as surrogate model for DRL testing

Bachelor Thesis (2025)
Author(s)

L.M. Braszczyński (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

A. Panichella – Mentor (TU Delft - Software Engineering)

A.J. Bartlett – Mentor (TU Delft - Multimedia Computing)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
25-06-2025
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

In recent years, Deep Reinforcement Learning (DRL) has moved away from playing games to more practical tasks like autonomous parking. This transition has created a need for efficient testing of DRL agents. To evaluate an agent, we need to run a simulation of the task and let the agent decide what actions to perform. Running these simulations is expensive and time-consuming. The problem of testing is that we need hundreds of different test scenarios, and each simulation takes from seconds to minutes depending on the use case, which is why choosing the right initial state of the simulation is critical. Current solutions leverage the idea of surrogate models that can approximate how difficult it will be for an agent to complete a given environment without running the tests. Existing work has explored the use of surrogate models for DRL tasks, creating a Multi-Layer Perceptron to act as a surrogate model for a DRL agent attempting to park a car in a parking lot. Parking scenarios are inherently spatial problems, and MLPs are not able to take advantage of that spatial structure. To address this limitation, we used Convolutional Neural Networks (CNNs), which are designed to handle spatial information more effectively, and should therefore improve prediction performance over MLPs. Our approach transforms tabular data into a low-resolution grid-like image representing a parking lot. This approach guides a genetic algorithm to discover a mean of 25.76 failures per run, a 72% improvement over the 14.98 failures found by the MLP baseline. Our model also achieves significantly higher scores on diversity metrics of generated environments.

Files

RP_Paper_Braszczynski_LM.pdf
(pdf | 0.571 Mb)
License info not available