Lost in abstraction
Exploring our way to efficient reinforcement learning
Rolf Starre (TU Delft - Sequential Decision Making)
FA Oliehoek – Promotor (TU Delft - Sequential Decision Making)
M. Loog – Promotor (Radboud Universiteit Nijmegen, TU Delft - Pattern Recognition and Bioinformatics)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Reinforcement Learning (RL) methods aim to find near-optimal solutions to sequential decision-making problems with initially unknown dynamics. These methods learn by interacting with the environment and observing the outcomes of their actions. RL methods have made significant progress in recent years and good solutions to difficult problems have been found in rapid succession. However, these successes often rely on access to a simulator, which makes it possible to generate a lot of experience cheaply and safely. In contrast, there are many real-world applications of RL where learning must occur solely through experience obtained in the environment itself. This is often time-consuming and expensive, with risks such as damage to equipment. This makes efficiently collecting and using experience of crucial importance. The thesis focuses on improving the learning efficiency of RL methods.
Two methods to improve learning efficiency are Model-based Reinforcement Learning (MBRL) and state abstraction. MBRL methods learn a model and use it for planning and learning, which drives efficient learning by directing exploration to unknown areas of a problem. On the other hand, state abstraction reduces the size of a problem, which achieves efficient learning in an alternative way.
This thesis focuses on combining these two methods, aiming to achieve even greater learning efficiency. We first survey methods that have previously combined MBRL and abstraction, including approaches ranging from state aggregation to abstractions based on deep learning. We identify challenges resulting from the combination of MBRL and abstraction, particularly focusing on the view of RL plus abstraction as a partially observable problem. From this perspective, we demonstrate how this combination leads to perceptual aliasing, where different states are perceived as the same state. This implies the observed behavior is no longer guaranteed to adhere to the assumptions required for most analyses.
Next, this thesis addresses the issue of perceptual aliasing with a theoretical analysis of the combination of MBRL and abstracted observations. While there are many algorithms with performance guarantees without abstraction, it may come as a surprise that no such guarantees are available when combining MBRL and abstraction, where MBRL merely observes abstract states. We prove that, even in this context, it is still possible to guarantee that an accurate model can be learned. Based on this result, we extend the performance guarantees of MBRL methods to learning with abstract observations.
Finally, we shift our focus to partially observable problems. Previously, we assumed the problems were fully observable and it was only the abstraction that rendered them partially observable. However, many complex problems are partially observable by nature. A difficulty in these problems is the belief space the agent needs to reason about, which is typically too large to find an exact solution. Online planning, which involves choosing actions within a limited amount of time, is often used as an alternative for finding solutions. In this setting, abstraction can provide additional benefits by potentially increasing the planning speed, since it reduces the size of the model.
We propose and investigate an abstraction method that uses the structure of the problem to define different levels of abstraction. We evaluate our approach empirically in several domains and find that abstract models can indeed enable faster planning which can increase performance, even when the abstraction leads to a loss of information. Further, we show that abstractions can improve performance even under a fixed number of simulations. This occurs because abstract models can aggregate multiple samples that the original model treats independently, thereby using experience more efficiently.
This thesis theoretically and empirically shows that we can learn efficiently by combining MBRL and abstraction. The results of this investigation advance our understanding of this combination, furthering knowledge in this important area of research and providing a foundation that can support effective learning in complex real-world problems.