Advancing Deep Reinforcement Learning for Real-World Traffic Signal Control

Addressing Sampling Challenges and Multi-Modal Traffic Dynamics

Master Thesis (2024)
Author(s)

K.F. Ceton (TU Delft - Mechanical Engineering)

Contributor(s)

S. Grammatico – Mentor (TU Delft - Mechanical Engineering)

Tijs van Bakel – Mentor (Technolution)

G. Pantazis – Mentor (TU Delft - Mechanical Engineering)

A. Dabiri – Graduation committee member (TU Delft - Mechanical Engineering)

Faculty
Mechanical Engineering
More Info
expand_more
Publication Year
2024
Language
English
Graduation Date
09-12-2024
Awarding Institution
Delft University of Technology
Programme
Mechanical Engineering
Faculty
Mechanical Engineering
Downloads counter
209
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Deep Reinforcement Learning (DRL) is a promising approach to Traffic Signal Control (TSC). However, significant challenges remain in translating this potential into real-world traffic management solutions. This thesis investigates obstacles hindering the application of DRL in real-world TSC, focusing on low sampling frequencies and the complexities of multi-modal traffic scenarios.

We developed a high-frequency sampling Proximal Policy Optimization (PPO) approach for TSC at a four-legged intersection, integrating both vehicle and pedestrian traffic in a multimodal setting. By employing Invalid Action Masking (IAM), we effectively handle signal timing constraints across these modalities. The framework was evaluated through traffic volume sensitivity analyses, assessments of generalization capabilities, disturbance rejection tests, and comparisons of methods for handling invalid actions.

The results indicate that short sampling intervals, such as 1 second, do not improve performance in terms of time-loss, with 4 to 6 seconds identified as the optimal range for PPO in TSC of a four-legged intersection. The findings also demonstrate that IAM can effectively be incorporated without compromising performance. When evaluating the ability to handle sudden spikes in traffic volume, PPO demonstrated superior performance, outperforming baseline methods such as max-pressure and fixed-time strategies in terms of both overshoot and settling time. Also, the results show that PPO can effectively prioritize vehicle and pedestrian modalities, displaying a clear proportional decrease in time-loss for the prioritized modality.

Files

THESIS_REPORT_KOEN_CETON.pdf
(pdf | 15.2 Mb)
- Embargo expired in 31-01-2025
License info not available