Adaptive Reinforcement Learning

Increasing the applicability for large and time varying systems using parallel Gaussian Process regression and adaptive nonlinear control

More Info
expand_more

Abstract

This thesis investigates the applicability of the Probabilistic Inference for Learning COntrol (PILCO) algorithm to large systems and systems with time varying measurement noise. PILCO is a state-of-the-art model-learning Reinforcement Learning (RL) algorithm that uses a Gaussian Process (GP) model to average over uncertainties during learning. Simulated case studies on a second-order system and a cart-pole system show that both the Radial Basis Function (RBF) controller and the GP controller find good solutions when the number of basis functions is chosen correctly. However, when a high number of basis functions is selected, the RBF controller fails completely, while the GP controller is able find a suboptimal solution. In order to reduce the computational time for large systems is the identification of the GP model parallelized. For a four dimensional model the parallelization results in a 20 to 40 percent reduction of the identification computational time. A simulated case study of a cart-pole system shows a strong decrease in performance when increasing the measurement noise variance or kurtosis. The controller is robust for changing skewness of the measurement noise. Furthermore is the variance of the measurement noise an important parameter, because it has to be selected as a fixed parameter of the GP controller prior to learning. Therefore Adaptive-Probabilistic Inference for Learning COntrol (A-PILCO) is proposed. This is a framework that initiates a new learning process when the measurement noise variance exceeds its confidence bounds. By reducing the computational time significantly for large and/or complex systems and by implementing the A-PILCO framework the PILCO algorithm becomes applicable larger set of systems.