A Novel Framework Combining MPC and Deep Reinforcement Learning With Application to Freeway Traffic Control

Journal Article (2024)
Author(s)

D. Sun (TU Delft - Transport and Planning)

Anahita Jamshidnejad (TU Delft - Control & Simulation)

BHK Schutter (TU Delft - Delft Center for Systems and Control)

Research Group
Team Bart De Schutter
DOI related publication
https://doi.org/10.1109/TITS.2023.3342651
More Info
expand_more
Publication Year
2024
Language
English
Research Group
Team Bart De Schutter
Issue number
7
Volume number
25
Pages (from-to)
6756-6769
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Model predictive control (MPC) and deep reinforcement learning (DRL) have been developed extensively as two independent techniques for traffic management. Although the features of MPC and DRL complement each other very well, few of the current studies consider combining these two methods for application in the field of freeway traffic control. This paper proposes a novel framework for integrating MPC and DRL methods for freeway traffic control that is different from existing MPC-(D)RL methods. Specifically, the proposed framework adopts a hierarchical structure, where a high-level efficient MPC component works at a low frequency to provide a baseline control input, while the DRL component works at a high frequency to modify online the output generated by MPC. The control framework, therefore, needs only limited online computational resources and is able to handle uncertainties and external disturbances after proper learning with enough training data. The proposed framework is implemented on a benchmark freeway network in order to coordinate ramp metering and variable speed limits, and the performance is compared with standard MPC and DRL approaches. The simulation results show that the proposed framework outperforms standalone MPC and DRL methods in terms of total time spent (TTS) and constraint satisfaction, despite model uncertainties and external disturbances.