Adaptive Reinforcement Learning

Increasing the applicability for large and time varying systems using parallel Gaussian Process regression and adaptive nonlinear control

Master thesis (2014)

Authors

K. Van Witteveen

Contributors

M. Verhaegen (mentor)

B. Ninness (mentor)

H.R.G.W. Verstraete (mentor)

Programme

Systems and Control (Mechanical, Maritime and Materials Engineering) (TU Delft)

Gaussian Process Reinforcement learning Bayesian inference Parallel computing Adaptive nonlinear control

To reference this document use:

http://resolver.tudelft.nl/uuid:e510b316-668d-48aa-aa01-c9a5f97073cb

More Info

expand_more

Published Date

23-01-2014

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Programme

Systems and Control

Abstract

This thesis investigates the applicability of the Probabilistic Inference for Learning COntrol (PILCO) algorithm to large systems and systems with time varying measurement noise. PILCO is a state-of-the-art model-learning Reinforcement Learning (RL) algorithm that uses a Gaussian Process (GP) model to average over uncertainties during learning. Simulated case studies on a second-order system and a cart-pole system show that both the Radial Basis Function (RBF) controller and the GP controller find good solutions when the number of basis functions is chosen correctly. However, when a high number of basis functions is selected, the RBF controller fails completely, while the GP controller is able find a suboptimal solution. In order to reduce the computational time for large systems is the identification of the GP model parallelized. For a four dimensional model the parallelization results in a 20 to 40 percent reduction of the identification computational time. A simulated case study of a cart-pole system shows a strong decrease in performance when increasing the measurement noise variance or kurtosis. The controller is robust for changing skewness of the measurement noise. Furthermore is the variance of the measurement noise an important parameter, because it has to be selected as a fixed parameter of the GP controller prior to learning. Therefore Adaptive-Probabilistic Inference for Learning COntrol (A-PILCO) is proposed. This is a framework that initiates a new learning process when the measurement noise variance exceeds its confidence bounds. By reducing the computational time significantly for large and/or complex systems and by implementing the A-PILCO framework the PILCO algorithm becomes applicable larger set of systems.

Files

MSc_thesis_Koen_van_Witteveen_... (pdf)

(pdf | 2.02 Mb)