Impact of Pre-training on Deep Reinforcement Learning Ramp Metering Systems
Callum Evans (TU Delft - Traffic Systems Engineering)
Marco Rinaldi (TU Delft - Traffic Systems Engineering)
Henk Taale (TU Delft - Traffic Systems Engineering)
Serge Hoogendoorn (TU Delft - Traffic Systems Engineering)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Pre-training is a process used to enhance the learning of deep reinforcement learning (RL) algorithms through initial guidance from an expert demonstrator. This involves training a neural network to replicate the outputs of the selected expert before allowing the RL agent to specialise and develop its own policy. This paper outlines a study that aims to analyse the impact of pre-training on deep RL algorithms used in ramp metering. Specifically, behaviour cloning is performed for increasing lengths of time (0-10,000 epochs), with ALINEA as the chosen expert algorithm guiding a proposed Proximal Policy Optimisation (PPO)-based system. The results confirm that, with the same length of training, some initial guidance through pre-training can significantly improve the system’s effectiveness in reducing congestion compared to no pre-training. Otherwise, excessive pre-training may lead to overfitting and reduced generalisability. Design issues resulting in weak model convergence, however, limit the algorithm’s overall performance in the chosen scenario.