Evaluating the Robustness of SAC under Distributional Shifts in Driving Domain
L. Polovina (TU Delft - Electrical Engineering, Mathematics and Computer Science)
FA Oliehoek – Mentor (TU Delft - Sequential Decision Making)
M.M. Celikok – Mentor (TU Delft - Sequential Decision Making)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Reinforcement Learning (RL) has shown strong potential in complex decision-making domains, but its likelihood to distributional shifts between training and deployment environments remains a significant barrier to real-world reliability, particularly in safety-critical contexts such as autonomous driving. This study investigates the robustness of the Soft Actor-Critic (SAC) algorithm under such distributional shifts, with a focus on the influence of entropy regularization. Using the HighwayEnv simulator, SAC agents were trained with a range of fixed entropy coefficients as well as automatic entropy tuning. The agents were evaluated under varying traffic densities and environmental complexities. Experimental results reveal that moderate fixed entropy settings (0.05 and 0.2) each perform well under specific conditions, while a high entropy setting (0.9) achieves superior performance in more challenging scenarios. Notably, automatic entropy tuning consistently delivered the best overall results, achieving high average rewards and low crash rates across all test environments. All experiments were conducted on the DelftBlue supercomputer to ensure computational reliability and scalability. These findings underscore the importance of adaptive exploration strategies in improving policy generalization in the face of distributional shifts.