Cooperative Planning and Control for Connected and Automated Vehicles' On-ramp Merging in Mixed Traffic Through Value Decomposition-based Multiagent Deep Reinforcement Learning

Master Thesis (2024)
Authors

Y. Zhang (TU Delft - Civil Engineering & Geosciences)

Supervisors

Haneen Farah (TU Delft - Traffic Systems Engineering)

Faculty
Civil Engineering & Geosciences
More Info
expand_more
Publication Year
2024
Language
English
Graduation Date
22-11-2024
Awarding Institution
Delft University of Technology
Programme
Transport, Infrastructure and Logistics
Faculty
Civil Engineering & Geosciences
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Connected and Automated Vehicles (CAVs) have the poten- tial to revolutionize transportation systems, but their integration with human-driven vehicles (HDVs) in mixed traffic environments presents significant challenges, particularly in complex scenarios such as on-ramp merging. This paper addresses the challenge of on-ramp merging for CAVs in mixed traffic environments, proposing a novel approach called QMIX-QLambdaM. We formulate the problem as a Centralized Train- ing with Decentralized Execution (CTDE) Cooperative Multi-Agent Re- inforcement Learning (MARL) task, capable of handling dynamic sce- narios with both CAVs and HDVs. QMIX-QLambdaM enhances the QMIX algorithm by incorporating Q(λ) returns for improved value es- timation and an action masking mechanism for safer action selection. Our comprehensive experiments demonstrate that QMIX-QLambdaM consistently outperforms state-of-the-art algorithms, including QMIX, MAA2C, and COMA, across various performance metrics related to traf- fic efficiency and safety. The proposed method exhibits superior adapt- ability across different traffic densities, maintaining high performance in terms of safety, efficiency, and overall rewards. Furthermore, case stud- ies illustrate QMIX-QLambdaM’s ability to generate effective strate- gic control for both main-lane and merging-lane vehicles, showcasing smoother driving behavior and better collision avoidance compared to baseline methods. The learning curve comparison also reveals QMIX- QLambdaM’s advantage in credit assignment compared to other CTDE baselines for the formulated problem. The code are available at https: //github.com/ayton-zhang/MARL_qmix_merging.

Files

Yuteng_thesis.pdf
(pdf | 4.1 Mb)
License info not available