The Effects of Entropy Regularization and Lyapunov Stability Constraint on Multi-Agent Reinforcement Learning for Autonomous Driving

Madi, Mohamed

The Effects of Entropy Regularization and Lyapunov Stability Constraint on Multi-Agent Reinforcement Learning for Autonomous Driving

Title

The Effects of Entropy Regularization and Lyapunov Stability Constraint on Multi-Agent Reinforcement Learning for Autonomous Driving

Author

Madi, Mohamed (TU Delft Mechanical, Maritime and Materials Engineering)

Contributor

Pan, W. (mentor)

Degree granting institution

Delft University of Technology

Programme

Mechanical Engineering

Date

2022-08-31

Abstract

High level decision making in Autonomous Driving (AD) is a challenging task due to the presence of multiple actors and complex driving interactions. Multi-Agent Reinforcement Learning (MARL) has been proposed to learn multiple driving policies concurrently to solve AD tasks. In the literature, multi-agent algorithms have been shown to outperform single-agent algorithms and rule-based algorithms. Also several techniques have been employed to facilitate convergence in policy learning such as parameter sharing and local reward design. Further, functional safety in AD has been addressed with techniques such as unsafe action-masking. However, there is a gap in the literature on the study of the effects of entropy regularization and on policies learned with closed-loop stability guarantee to solve AD tasks in MARL. In this thesis, research gaps are addressed in entropy regularization and Lyapunov stability constrained policy objectives applied to Autonomous Driving in MARL. Specifically, it is demonstrated on the lane-keeping task with 2 agents that entropy regularization improves training stability. It was also shown that in stochastic multi-agent algorithms on the lane-keeping task, a Lyapunov constrained policy objective performs better in average episode returns, success rate and collision rate than a policy objective without a Lyapunov constraint with low measurement noise perturbation. However, an algorithm with a stochastic actor performs worse than that with a deterministic actor in stability and lane center proximity on the lane-keeping task.

Subject

Multi-agent reinforcement learning
Autonomous driving
Entropy regularization
Stability

To reference this document use:

http://resolver.tudelft.nl/uuid:743d8142-9d59-4c97-a8da-2c0d734c8ebc

Part of collection

Student theses

Document type

master thesis

Rights

Files

PDF

Thesis_1.pdf

864.83 KB

Close viewer