Impact of Pre-training on Deep Reinforcement Learning Ramp Metering Systems

None, None; None, None; None, None; None, None

Impact of Pre-training on Deep Reinforcement Learning Ramp Metering Systems

Conference Paper (2025)

Author(s)

Callum Evans (TU Delft - Civil Engineering & Geosciences)

Marco Rinaldi (TU Delft - Civil Engineering & Geosciences)

Henk Taale (TU Delft - Civil Engineering & Geosciences)

Serge Hoogendoorn (TU Delft - Civil Engineering & Geosciences)

Research Group

Traffic Systems Engineering

Reinforcement learning Ramp metering Network management Road traffic control

To reference this document use

https://resolver.tudelft.nl/uuid:3b31b2cc-1896-4cb2-aa6a-fa10dd451674

More Info

expand_more

Publication Year

2025

Language

English

Research Group

Traffic Systems Engineering

Event

104th Annual Meeting of the Transportation Research Board (TRB) (2025-01-05 - 2025-01-09), Walter E. Washington Convention Center, Washington DC, United States

Downloads counter

39

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Pre-training is a process used to enhance the learning of deep reinforcement learning (RL) algorithms through initial guidance from an expert demonstrator. This involves training a neural network to replicate the outputs of the selected expert before allowing the RL agent to specialise and develop its own policy. This paper outlines a study that aims to analyse the impact of pre-training on deep RL algorithms used in ramp metering. Specifically, behaviour cloning is performed for increasing lengths of time (0-10,000 epochs), with ALINEA as the chosen expert algorithm guiding a proposed Proximal Policy Optimisation (PPO)-based system. The results confirm that, with the same length of training, some initial guidance through pre-training can significantly improve the system’s effectiveness in reducing congestion compared to no pre-training. Otherwise, excessive pre-training may lead to overfitting and reduced generalisability. Design issues resulting in weak model convergence, however, limit the algorithm’s overall performance in the chosen scenario.

Files

TRBAM-25-01884_IN01_0927202410... (pdf)

(pdf | 0 Mb)

- Embargo expired in 09-07-2025

Taverne