Tabular Reinforcement Learning Aided by Generalisation Methods

Master thesis (2016)

Authors

W.H. Langenkamp

Contributors

M.T.J. Spaan (mentor)

Programme

Algorithmics () (TU Delft)

Machine learning Reinforcement learning Transfer learning Function approximation Tabular methods Path planning

To reference this document use:

http://resolver.tudelft.nl/uuid:b9fd4e74-a734-42c0-b400-17c89bda89e6

More Info

expand_more

Published Date

01-04-2016

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Programme

Algorithmics

Abstract

Reinforcement learning is a machine learning paradigm that deals with optimisation and learns by interacting with its environment. Tabular reinforcement learning methods are popular because of their relative simplicity combined with good guarantees of finding an optimal solution. The downside is that they suffer from an exponentially growing exploration space, which will obstruct the learning of large tasks. Function approximation is the most commonly used alternative to deal with this problem. It involves interpolating of states that have never been encountered before by learning the values of generalising features. This generalising property is helpful in targeting larger problems and allows for faster learning. Unfortunately, it can be difficult to specify proper features that are required to learn a solution, and convergence cannot always be guaranteed. Generalisation at an even higher level is called transfer learning, where priorly acquired knowledge from one or more tasks is reused to aid the learning process of the next task. This thesis proposes a framework that combines tabular reinforcement learning methods with both of these generalising concepts to achieve a convergent learning process with good generalisation properties. To test the viability of the proposed method, it is used to solve discrete path planning problems. Results of these tests show that simultaneous learning with the help of function approximation in a parallel learning process can be leveraged to achieve a significant reduction in steps in both the first and consecutive tasks.

Files

Thesis_W.H._Langenkamp.pdf

(pdf | 0.796 Mb)