Practical implementation of reinforcement learning algorithms for giving personalised speed advice to cyclists approaching intersections using function approximation and Dyna

More Info
expand_more

Abstract


Being a safe and healthy alternative for polluting and space-inefficient motorised vehicles, cycling can strongly improve living conditions in urban areas. Idling in front of traffic lights is seen as one of the major inconveniences of commuting by bicycle. By giving personalised speed advice, the probability of catching a green light can be increased whilst taking the cyclist preferences into account. Due to its adaptive properties, Reinforcement learning (\acs{RL}) is a suited algorithm for developing optimal speed advice policies when dealing with a dynamic traffic environment and unique cyclist preferences. Generally, a large amount of training samples is required to successfully train a \acs{RL} algorithm. This poses a problem for this specific application since training samples must be generated by humans and are therefore scarce. Moreover, exploration of the environment is challenging since humans will not comply with irrational speed advice. These factors currently restrain the practical implementation of \acs{RL} algorithms for giving speed advice. This thesis aims to overcome these problems whilst maintaining a competitive performance compared to conventional \acs{RL} algorithms. This is done by using function approximators and a combined planning and learning method called Dyna. During a case study, three different function approximators are compared to reduce the amount of required training samples, namely polynomial functions, radial basis functions, and artificial neural networks. Secondly, the effectiveness of Dyna to improve the quality of the speed advice in an unknown environment is assessed. Finally, these methods are applied in a framework focused on the practical implementation of \acs{RL} for giving speed advice. It was concluded that function approximation method can significantly reduce the amount of required training samples to train a \acs{RL} algorithm. Dyna can increase user retention by providing cyclists with a high quality speed advice algorithm during the early learning phase of the algorithm. Therefore, it can be concluded that this \acs{RL} approach for giving personalised speed advice to cyclist approaching intersections is practically implementable and can even outperform benchmark algorithms in terms of travel time, energy consumption, and safety.