Multi-Agent Decision-Making Modes in Uncertain Interactive Traffic Scenarios via Graph Convolution-Based Deep Reinforcement Learning

None, None; None, None; None, None; None, None; None, None; None, None

Multi-Agent Decision-Making Modes in Uncertain Interactive Traffic Scenarios via Graph Convolution-Based Deep Reinforcement Learning

Journal Article (2022)

Author(s)

Xin Gao (Beijing Institute of Technology)

Xueyuan Li (Beijing Institute of Technology)

Qi Liu (Beijing Institute of Technology)

Zirui Li (Transport and Planning, Beijing Institute of Technology)

Fan Yang (Beijing Institute of Technology)

Tian Luan (Beijing Institute of Technology)

Transport and Planning

Connected autonomous vehicles GQN MDGQN Multi-mode decision-making Reward function matrix Uncertain highway exit scene

DOI related publication

https://doi.org/10.3390/s22124586 Final published version

To reference this document use

https://resolver.tudelft.nl/uuid:6302ab05-4e3f-4952-87e9-2804fdbe9f01

More Info

expand_more

Publication Year

2022

Language

English

Transport and Planning

Issue number

12

Volume number

22

Article number

4586

Downloads counter

330

Collections

Institutional Repository

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

As one of the main elements of reinforcement learning, the design of the reward function is often not given enough attention when reinforcement learning is used in concrete applications, which leads to unsatisfactory performances. In this study, a reward function matrix is proposed for training various decision-making modes with emphasis on decision-making styles and further emphasis on incentives and punishments. Additionally, we model a traffic scene via graph model to better represent the interaction between vehicles, and adopt the graph convolutional network (GCN) to extract the features of the graph structure to help the connected autonomous vehicles perform decision-making directly. Furthermore, we combine GCN with deep Q-learning and multi-step double deep Q-learning to train four decision-making modes, which are named the graph convolutional deep Q-network (GQN) and the multi-step double graph convolutional deep Q-network (MDGQN). In the simulation, the superiority of the reward function matrix is proved by comparing it with the baseline, and evaluation metrics are proposed to verify the performance differences among decision-making modes. Results show that the trained decision-making modes can satisfy various driving requirements, including task completion rate, safety requirements, comfort level, and completion efficiency, by adjusting the weight values in the reward function matrix. Finally, the decision-making modes trained by MDGQN had better performance in an uncertain highway exit scene than those trained by GQN.

Files

Sensors_22_04586.pdf

(pdf | 13.4 Mb)