Practical implementation of reinforcement learning algorithms for giving personalised speed advice to cyclists approaching intersections using function approximation and Dyna

Master thesis (2021)

Authors

M.P.H. Becker Mechanical Engineering

Contributors

A. Dabiri Team Bart De Schutter - Mechanical, Maritime and Materials Engineering (mentor)

B.H.K. De Schutter Delft Center for Systems and Control (graduation committee member)

Faculty

Mechanical Engineering, Mechanical Engineering

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:5afe4aad-a583-405d-8dbe-e3b1425a0762

Published Date

27-01-2021

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Mechanical Engineering

Abstract

Being a safe and healthy alternative for polluting and space-inefficient motorised vehicles, cycling can strongly improve living conditions in urban areas. Idling in front of traffic lights is seen as one of the major inconveniences of commuting by bicycle. By giving personalised speed advice, the probability of catching a green light can be increased whilst taking the cyclist preferences into account. Due to its adaptive properties, Reinforcement learning (\acs{RL}) is a suited algorithm for developing optimal speed advice policies when dealing with a dynamic traffic environment and unique cyclist preferences. Generally, a large amount of training samples is required to successfully train a \acs{RL} algorithm. This poses a problem for this specific application since training samples must be generated by humans and are therefore scarce. Moreover, exploration of the environment is challenging since humans will not comply with irrational speed advice. These factors currently restrain the practical implementation of \acs{RL} algorithms for giving speed advice. This thesis aims to overcome these problems whilst maintaining a competitive performance compared to conventional \acs{RL} algorithms. This is done by using function approximators and a combined planning and learning method called Dyna. During a case study, three different function approximators are compared to reduce the amount of required training samples, namely polynomial functions, radial basis functions, and artificial neural networks. Secondly, the effectiveness of Dyna to improve the quality of the speed advice in an unknown environment is assessed. Finally, these methods are applied in a framework focused on the practical implementation of \acs{RL} for giving speed advice. It was concluded that function approximation method can significantly reduce the amount of required training samples to train a \acs{RL} algorithm. Dyna can increase user retention by providing cyclists with a high quality speed advice algorithm during the early learning phase of the algorithm. Therefore, it can be concluded that this \acs{RL} approach for giving personalised speed advice to cyclist approaching intersections is practically implementable and can even outperform benchmark algorithms in terms of travel time, energy consumption, and safety.

Files

Thesis_Midas_Becker.pdf

(.pdf | 8.42 Mb)