R.A.N. Starre | TU Delft Repository

Lost in abstraction

Exploring our way to efficient reinforcement learning

Doctoral thesis (2025) - R.A.N. Starre, F.A. Oliehoek, M. Loog

Reinforcement Learning (RL) methods aim to find near-optimal solutions to sequential decision-making problems with initially unknown dynamics. These methods learn by interacting with the environment and observing the outcomes of their actions. RL methods have made significant progress in recent years and good solutions to difficult problems have been found in rapid succession. However, these successes often rely on access to a simulator, which makes it possible to generate a lot of experience cheaply and safely. In contrast, there are many real-world applications of RL where learning must occur solely through experience obtained in the environment itself. This is often time-consuming and expensive, with risks such as damage to equipment. This makes efficiently collecting and using experience of crucial importance. The thesis focuses on improving the learning efficiency of RL methods.

Two methods to improve learning efficiency are Model-based Reinforcement Learning (MBRL) and state abstraction. MBRL methods learn a model and use it for planning and learning, which drives efficient learning by directing exploration to unknown areas of a problem. On the other hand, state abstraction reduces the size of a problem, which achieves efficient learning in an alternative way.

This thesis focuses on combining these two methods, aiming to achieve even greater learning efficiency. We first survey methods that have previously combined MBRL and abstraction, including approaches ranging from state aggregation to abstractions based on deep learning. We identify challenges resulting from the combination of MBRL and abstraction, particularly focusing on the view of RL plus abstraction as a partially observable problem. From this perspective, we demonstrate how this combination leads to perceptual aliasing, where different states are perceived as the same state. This implies the observed behavior is no longer guaranteed to adhere to the assumptions required for most analyses.

Next, this thesis addresses the issue of perceptual aliasing with a theoretical analysis of the combination of MBRL and abstracted observations. While there are many algorithms with performance guarantees without abstraction, it may come as a surprise that no such guarantees are available when combining MBRL and abstraction, where MBRL merely observes abstract states. We prove that, even in this context, it is still possible to guarantee that an accurate model can be learned. Based on this result, we extend the performance guarantees of MBRL methods to learning with abstract observations.

Finally, we shift our focus to partially observable problems. Previously, we assumed the problems were fully observable and it was only the abstraction that rendered them partially observable. However, many complex problems are partially observable by nature. A difficulty in these problems is the belief space the agent needs to reason about, which is typically too large to find an exact solution. Online planning, which involves choosing actions within a limited amount of time, is often used as an alternative for finding solutions. In this setting, abstraction can provide additional benefits by potentially increasing the planning speed, since it reduces the size of the model.

We propose and investigate an abstraction method that uses the structure of the problem to define different levels of abstraction. We evaluate our approach empirically in several domains and find that abstract models can indeed enable faster planning which can increase performance, even when the abstraction leads to a loss of information. Further, we show that abstractions can improve performance even under a fixed number of simulations. This occurs because abstract models can aggregate multiple samples that the original model treats independently, thereby using experience more efficiently.

This thesis theoretically and empirically shows that we can learn efficiently by combining MBRL and abstraction. The results of this investigation advance our understanding of this combination, furthering knowledge in this important area of research and providing a foundation that can support effective learning in complex real-world problems. ...

Reinforcement Learning (RL) methods aim to find near-optimal solutions to sequential decision-making problems with initially unknown dynamics. These methods learn by interacting with the environment and observing the outcomes of their actions. RL methods have made significant progress in recent years and good solutions to difficult problems have been found in rapid succession. However, these successes often rely on access to a simulator, which makes it possible to generate a lot of experience cheaply and safely. In contrast, there are many real-world applications of RL where learning must occur solely through experience obtained in the environment itself. This is often time-consuming and expensive, with risks such as damage to equipment. This makes efficiently collecting and using experience of crucial importance. The thesis focuses on improving the learning efficiency of RL methods.

Two methods to improve learning efficiency are Model-based Reinforcement Learning (MBRL) and state abstraction. MBRL methods learn a model and use it for planning and learning, which drives efficient learning by directing exploration to unknown areas of a problem. On the other hand, state abstraction reduces the size of a problem, which achieves efficient learning in an alternative way.

This thesis focuses on combining these two methods, aiming to achieve even greater learning efficiency. We first survey methods that have previously combined MBRL and abstraction, including approaches ranging from state aggregation to abstractions based on deep learning. We identify challenges resulting from the combination of MBRL and abstraction, particularly focusing on the view of RL plus abstraction as a partially observable problem. From this perspective, we demonstrate how this combination leads to perceptual aliasing, where different states are perceived as the same state. This implies the observed behavior is no longer guaranteed to adhere to the assumptions required for most analyses.

Next, this thesis addresses the issue of perceptual aliasing with a theoretical analysis of the combination of MBRL and abstracted observations. While there are many algorithms with performance guarantees without abstraction, it may come as a surprise that no such guarantees are available when combining MBRL and abstraction, where MBRL merely observes abstract states. We prove that, even in this context, it is still possible to guarantee that an accurate model can be learned. Based on this result, we extend the performance guarantees of MBRL methods to learning with abstract observations.

Finally, we shift our focus to partially observable problems. Previously, we assumed the problems were fully observable and it was only the abstraction that rendered them partially observable. However, many complex problems are partially observable by nature. A difficulty in these problems is the belief space the agent needs to reason about, which is typically too large to find an exact solution. Online planning, which involves choosing actions within a limited amount of time, is often used as an alternative for finding solutions. In this setting, abstraction can provide additional benefits by potentially increasing the planning speed, since it reduces the size of the model.

We propose and investigate an abstraction method that uses the structure of the problem to define different levels of abstraction. We evaluate our approach empirically in several domains and find that abstract models can indeed enable faster planning which can increase performance, even when the abstraction leads to a loss of information. Further, we show that abstractions can improve performance even under a fixed number of simulations. This occurs because abstract models can aggregate multiple samples that the original model treats independently, thereby using experience more efficiently.

This thesis theoretically and empirically shows that we can learn efficiently by combining MBRL and abstraction. The results of this investigation advance our understanding of this combination, furthering knowledge in this important area of research and providing a foundation that can support effective learning in complex real-world problems.

An Analysis of Model-Based Reinforcement Learning From Abstracted Observations

Journal article (2023) - R.A.N. Starre, M. Loog, E. Congeduti, F.A. Oliehoek

Many methods for Model-based Reinforcement learning (MBRL) in Markov decision processes (MDPs) provide guarantees for both the accuracy of the model they can deliver and the learning efficiency. At the same time, state abstraction techniques allow for a reduction of the size of an MDP while maintaining a bounded loss with respect to the original problem. Therefore, it may come as a surprise that no such guarantees are available when combining both techniques, i.e., where MBRL merely observes abstract states. Our theoretical analysis shows that abstraction can introduce a dependence between samples collected online (e.g., in the real world). That means that, without taking this dependence into
account, results for MBRL do not directly extend to this setting. Our result shows that we can use concentration inequalities for martingales to overcome this problem. This result makes it possible to extend the guarantees of existing MBRL algorithms to the setting with abstraction. We illustrate this by combining R-MAX, a prototypical MBRL algorithm, with abstraction, thus producing the first performance guarantees for model-based ‘RL from Abstracted Observations’: model-based reinforcement learning with an abstract model. ...

Model-Based Reinforcement Learning with State Abstraction: A Survey

Conference paper (2022) - R.A.N. Starre, M. Loog, F.A. Oliehoek

Model-based reinforcement learning methods are promising since they can increase sample efficiency while simultaneously improving generalizability. Learning can also be made more efficient through state abstraction, which delivers more compact models. Model-based reinforcement learning methods have been combined with learning abstract models to profit from both effects. We consider a wide range of state abstractions that have been covered in the literature, from straightforward state aggregation to deep learned representations, and sketch challenges that arise when combining model-based reinforcement learning with abstraction. We further show how various methods deal with these challenges and point to open questions and opportunities for further research. ...

Influence-aware memory architectures for deep reinforcement learning in POMDPs

Journal article (2022) - Miguel Suau , Jinke He, Elena Congeduti, Rolf Starre, Aleksander Czechowski, Frans A. Oliehoek

Due to its perceptual limitations, an agent may have too little information about the environment to act optimally. In such cases, it is important to keep track of the action-observation history to uncover hidden state information. Recent deep reinforcement learning methods use recurrent neural networks (RNN) to memorize past observations. However, these models are expensive to train and have convergence difficulties, especially when dealing with high dimensional data. In this paper, we propose influence-aware memory, a theoretically inspired memory architecture that alleviates the training difficulties by restricting the input of the recurrent layers to those variables that influence the hidden state information. Moreover, as opposed to standard RNNs, in which every piece of information used for estimating Q values is inevitably fed back into the network for the next prediction, our model allows information to flow without being necessarily stored in the RNN’s internal memory. Results indicate that, by letting the recurrent layers focus on a small fraction of the observation variables while processing the rest of the information with a feedforward neural network, we can outperform standard recurrent architectures both in training speed and policy performance. This approach also reduces runtime and obtains better scores than methods that stack multiple observations to remove partial observability. ...

Comparing Exploration Approaches in Deep Reinforcement Learning for Traffic Light Control

Conference paper (2020) - Y. Oren, R.A.N. Starre, F.A. Oliehoek

Identifying the most efficient exploration approach for deep reinforcement learning in traffic light control is not a trivial task, and can be a critical step in the development of reinforcement learning solutions that can effectively reduce traffic congestion. It is common to use baseline dithering methods such as -greedy. However, the value of more evolved exploration approaches in this setting has not yet been determined. This paper addresses this concern by comparing the performance of the popular deep Q-learning algorithm using one baseline and two state of the art exploration approaches, and their combination. Specifically, -greedy is used as a baseline, and compared to the exploration approaches Bootstrapped DQN, randomized prior functions, and their combination. This is done in three different traffic scenarios, capturing different traffic profiles. The results obtained suggest that the higher the complexity of the traffic scenario, and the larger the size of the observation space of the agent, the larger the gain from efficient exploration. This is illustrated by the improved performance observed in the agents using efficient exploration and enjoying a larger observation space in the complex traffic scenarios. ...

Influence-Based Abstraction in Deep Reinforcement Learning

Conference paper (2019) - Miguel Suau de Castro, Elena Congeduti, Rolf Starre, Aleksander Czechowski, Frans Oliehoek

thousands, or even millions of state variables. Unfortunately, applying reinforcement learning algorithms to handle complex tasks becomes more and more challenging as the number of state variables increases. In this paper, we build on the concept of influence-based abstraction which tries to tackle such scalability issues by decomposing large systems into small regions. We explore this method in the context of deep reinforcement learning, showing that by keeping track of a small set of variables in the history of previous actions and observations we can learn policies that can effectively control a local region in the global system. ...