The Effects of Entropy Regularization and Lyapunov Stability Constraint on Multi-Agent Reinforcement Learning for Autonomous Driving

More Info


High level decision making in Autonomous Driving (AD) is a challenging task due to the presence of multiple actors and complex driving interactions. Multi-Agent Reinforcement Learning (MARL) has been proposed to learn multiple driving policies concurrently to solve AD tasks. In the literature, multi-agent algorithms have been shown to outperform single-agent algorithms and rule-based algorithms. Also several techniques have been employed to facilitate convergence in policy learning such as parameter sharing and local reward design. Further, functional safety in AD has been addressed with techniques such as unsafe action-masking. However, there is a gap in the literature on the study of the effects of entropy regularization and on policies learned with closed-loop stability guarantee to solve AD tasks in MARL. In this thesis, research gaps are addressed in entropy regularization and Lyapunov stability constrained policy objectives applied to Autonomous Driving in MARL. Specifically, it is demonstrated on the lane-keeping task with 2 agents that entropy regularization improves training stability. It was also shown that in stochastic multi-agent algorithms on the lane-keeping task, a Lyapunov constrained policy objective performs better in average episode returns, success rate and collision rate than a policy objective without a Lyapunov constraint with low measurement noise perturbation. However, an algorithm with a stochastic actor performs worse than that with a deterministic actor in stability and lane center proximity on the lane-keeping task.