Surrogate Reloaded: Fast Testing for Deep Reinforcement Learning with Bayesian Neural Networks

None, None

Surrogate Reloaded: Fast Testing for Deep Reinforcement Learning with Bayesian Neural Networks

Bachelor Thesis (2025)

Author(s)

R. Montero González (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

A. Panichella – Mentor (TU Delft - Software Engineering)

A.J. Bartlett – Mentor (TU Delft - Multimedia Computing)

Faculty

Electrical Engineering, Mathematics and Computer Science

Deep Reinforcement Learning Surrogate Modelling Bayesian Neural Networks (BNNs)

To reference this document use:

https://resolver.tudelft.nl/uuid:fde4dcc4-7131-4fb5-bd4f-e58b2a91f9f2

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

25-06-2025

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Deep Reinforcement Learning (DRL) is a powerful framework for training autonomous agents in complex environments. However, testing these agents is still prohibitively expensive due to the need for extensive simulations and the rarity of failure events, such as collisions or timeouts, where the agent fails to complete its task safely or correctly. Existing surrogate models, such as Multi-Layer Perceptrons (MLPs), are a promising improvement by predicting failures without requiring full simulation runs. However, prior research has focused almost exclusively on MLPs, leaving it unclear whether other, more expressive machine learning models could improve performance. In this paper, we investigate whether Bayesian Neural Networks (BNNs), which incorporate probabilistic reasoning into neural architectures, can be more effective surrogates for failure prediction in DRL environments. We developed, trained, and evaluated a BNN surrogate and compared it against a pre-trained MLP baseline, using the HighwayEnv car parking environment as our test case. Our evaluation focused on comparing the predictive accuracy, precision, recall, F1-score, and Area Under the Receiver Operating Characteristic Curve (AUC-ROC) using training data, as well as assessing the models' effectiveness in the DRL parking environment. The results show that the BNN surrogate outperforms the MLP baseline in terms of practical utility for failure discovery. These findings suggest that BNNs can be a more effective surrogate model for prioritising failure scenarios in DRL testing.

Files

RP_Paper_Rodrigo_MG.pdf

(pdf | 0.753 Mb)

License info not available