Evaluating the Robustness of SAC under Distributional Shifts in Driving Domain

None, None

Evaluating the Robustness of SAC under Distributional Shifts in Driving Domain

Bachelor Thesis (2025)

Author(s)

L. Polovina (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

FA Oliehoek – Mentor (TU Delft - Sequential Decision Making)

M.M. Celikok – Mentor (TU Delft - Sequential Decision Making)

Faculty

Electrical Engineering, Mathematics and Computer Science

To reference this document use:

https://resolver.tudelft.nl/uuid:424a7029-8f4b-4c17-a8f8-0749c9a5e03d

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

25-06-2025

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Reinforcement Learning (RL) has shown strong potential in complex decision-making domains, but its likelihood to distributional shifts between training and deployment environments remains a significant barrier to real-world reliability, particularly in safety-critical contexts such as autonomous driving. This study investigates the robustness of the Soft Actor-Critic (SAC) algorithm under such distributional shifts, with a focus on the influence of entropy regularization. Using the HighwayEnv simulator, SAC agents were trained with a range of fixed entropy coefficients as well as automatic entropy tuning. The agents were evaluated under varying traffic densities and environmental complexities. Experimental results reveal that moderate fixed entropy settings (0.05 and 0.2) each perform well under specific conditions, while a high entropy setting (0.9) achieves superior performance in more challenging scenarios. Notably, automatic entropy tuning consistently delivered the best overall results, achieving high average rewards and low crash rates across all test environments. All experiments were conducted on the DelftBlue supercomputer to ensure computational reliability and scalability. These findings underscore the importance of adaptive exploration strategies in improving policy generalization in the face of distributional shifts.

Files

Research_paper-35.pdf

(pdf | 0.365 Mb)

License info not available