R.A.N. Starre
Please Note
6 records found
1
Lost in abstraction
Exploring our way to efficient reinforcement learning
Two methods to improve learning efficiency are Model-based Reinforcement Learning (MBRL) and state abstraction. MBRL methods learn a model and use it for planning and learning, which drives efficient learning by directing exploration to unknown areas of a problem. On the other hand, state abstraction reduces the size of a problem, which achieves efficient learning in an alternative way.
This thesis focuses on combining these two methods, aiming to achieve even greater learning efficiency. We first survey methods that have previously combined MBRL and abstraction, including approaches ranging from state aggregation to abstractions based on deep learning. We identify challenges resulting from the combination of MBRL and abstraction, particularly focusing on the view of RL plus abstraction as a partially observable problem. From this perspective, we demonstrate how this combination leads to perceptual aliasing, where different states are perceived as the same state. This implies the observed behavior is no longer guaranteed to adhere to the assumptions required for most analyses.
Next, this thesis addresses the issue of perceptual aliasing with a theoretical analysis of the combination of MBRL and abstracted observations. While there are many algorithms with performance guarantees without abstraction, it may come as a surprise that no such guarantees are available when combining MBRL and abstraction, where MBRL merely observes abstract states. We prove that, even in this context, it is still possible to guarantee that an accurate model can be learned. Based on this result, we extend the performance guarantees of MBRL methods to learning with abstract observations.
Finally, we shift our focus to partially observable problems. Previously, we assumed the problems were fully observable and it was only the abstraction that rendered them partially observable. However, many complex problems are partially observable by nature. A difficulty in these problems is the belief space the agent needs to reason about, which is typically too large to find an exact solution. Online planning, which involves choosing actions within a limited amount of time, is often used as an alternative for finding solutions. In this setting, abstraction can provide additional benefits by potentially increasing the planning speed, since it reduces the size of the model.
We propose and investigate an abstraction method that uses the structure of the problem to define different levels of abstraction. We evaluate our approach empirically in several domains and find that abstract models can indeed enable faster planning which can increase performance, even when the abstraction leads to a loss of information. Further, we show that abstractions can improve performance even under a fixed number of simulations. This occurs because abstract models can aggregate multiple samples that the original model treats independently, thereby using experience more efficiently.
This thesis theoretically and empirically shows that we can learn efficiently by combining MBRL and abstraction. The results of this investigation advance our understanding of this combination, furthering knowledge in this important area of research and providing a foundation that can support effective learning in complex real-world problems. ...
Two methods to improve learning efficiency are Model-based Reinforcement Learning (MBRL) and state abstraction. MBRL methods learn a model and use it for planning and learning, which drives efficient learning by directing exploration to unknown areas of a problem. On the other hand, state abstraction reduces the size of a problem, which achieves efficient learning in an alternative way.
This thesis focuses on combining these two methods, aiming to achieve even greater learning efficiency. We first survey methods that have previously combined MBRL and abstraction, including approaches ranging from state aggregation to abstractions based on deep learning. We identify challenges resulting from the combination of MBRL and abstraction, particularly focusing on the view of RL plus abstraction as a partially observable problem. From this perspective, we demonstrate how this combination leads to perceptual aliasing, where different states are perceived as the same state. This implies the observed behavior is no longer guaranteed to adhere to the assumptions required for most analyses.
Next, this thesis addresses the issue of perceptual aliasing with a theoretical analysis of the combination of MBRL and abstracted observations. While there are many algorithms with performance guarantees without abstraction, it may come as a surprise that no such guarantees are available when combining MBRL and abstraction, where MBRL merely observes abstract states. We prove that, even in this context, it is still possible to guarantee that an accurate model can be learned. Based on this result, we extend the performance guarantees of MBRL methods to learning with abstract observations.
Finally, we shift our focus to partially observable problems. Previously, we assumed the problems were fully observable and it was only the abstraction that rendered them partially observable. However, many complex problems are partially observable by nature. A difficulty in these problems is the belief space the agent needs to reason about, which is typically too large to find an exact solution. Online planning, which involves choosing actions within a limited amount of time, is often used as an alternative for finding solutions. In this setting, abstraction can provide additional benefits by potentially increasing the planning speed, since it reduces the size of the model.
We propose and investigate an abstraction method that uses the structure of the problem to define different levels of abstraction. We evaluate our approach empirically in several domains and find that abstract models can indeed enable faster planning which can increase performance, even when the abstraction leads to a loss of information. Further, we show that abstractions can improve performance even under a fixed number of simulations. This occurs because abstract models can aggregate multiple samples that the original model treats independently, thereby using experience more efficiently.
This thesis theoretically and empirically shows that we can learn efficiently by combining MBRL and abstraction. The results of this investigation advance our understanding of this combination, furthering knowledge in this important area of research and providing a foundation that can support effective learning in complex real-world problems.
account, results for MBRL do not directly extend to this setting. Our result shows that we can use concentration inequalities for martingales to overcome this problem. This result makes it possible to extend the guarantees of existing MBRL algorithms to the setting with abstraction. We illustrate this by combining R-MAX, a prototypical MBRL algorithm, with abstraction, thus producing the first performance guarantees for model-based ‘RL from Abstracted Observations’: model-based reinforcement learning with an abstract model. ...
account, results for MBRL do not directly extend to this setting. Our result shows that we can use concentration inequalities for martingales to overcome this problem. This result makes it possible to extend the guarantees of existing MBRL algorithms to the setting with abstraction. We illustrate this by combining R-MAX, a prototypical MBRL algorithm, with abstraction, thus producing the first performance guarantees for model-based ‘RL from Abstracted Observations’: model-based reinforcement learning with an abstract model.
Due to its perceptual limitations, an agent may have too little information about the environment to act optimally. In such cases, it is important to keep track of the action-observation history to uncover hidden state information. Recent deep reinforcement learning methods use recurrent neural networks (RNN) to memorize past observations. However, these models are expensive to train and have convergence difficulties, especially when dealing with high dimensional data. In this paper, we propose influence-aware memory, a theoretically inspired memory architecture that alleviates the training difficulties by restricting the input of the recurrent layers to those variables that influence the hidden state information. Moreover, as opposed to standard RNNs, in which every piece of information used for estimating Q values is inevitably fed back into the network for the next prediction, our model allows information to flow without being necessarily stored in the RNN’s internal memory. Results indicate that, by letting the recurrent layers focus on a small fraction of the observation variables while processing the rest of the information with a feedforward neural network, we can outperform standard recurrent architectures both in training speed and policy performance. This approach also reduces runtime and obtains better scores than methods that stack multiple observations to remove partial observability.