Reinforcement Learning (RL) has shown strong potential in complex decision-making domains, but its likelihood to distributional shifts between training and deployment environments remains a significant barrier to real-world reliability, particularly in safety-critical contexts su
...
Reinforcement Learning (RL) has shown strong potential in complex decision-making domains, but its likelihood to distributional shifts between training and deployment environments remains a significant barrier to real-world reliability, particularly in safety-critical contexts such as autonomous driving. This study investigates the robustness of the Soft Actor-Critic (SAC) algorithm under such distributional shifts, with a focus on the influence of entropy regularization. Using the HighwayEnv simulator, SAC agents were trained with a range of fixed entropy coefficients as well as automatic entropy tuning. The agents were evaluated under varying traffic densities and environmental complexities. Experimental results reveal that moderate fixed entropy settings (0.05 and 0.2) each perform well under specific conditions, while a high entropy setting (0.9) achieves superior performance in more challenging scenarios. Notably, automatic entropy tuning consistently delivered the best overall results, achieving high average rewards and low crash rates across all test environments. All experiments were conducted on the DelftBlue supercomputer to ensure computational reliability and scalability. These findings underscore the importance of adaptive exploration strategies in improving policy generalization in the face of distributional shifts.