Reinforcement learning with domain-specific relational inductive biases

Using Graph Neural Networks and domain knowledge

Master thesis (2021)

Authors

E.J. Vester Electrical Engineering, Mathematics and Computer Science

Contributors

M.T.J. Spaan Algorithmics - (supervisor 1)

J.W. Böhmer Algorithmics - (supervisor 2)

L. Cavalcante Siebert Interactive Intelligence - (coach)

Faculty

Electrical Engineering, Mathematics and Computer Science

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:b5bcf5e9-53d4-4a80-bb4d-92c04df804f3

Published Date

25-10-2021

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Reinforcement Learning (RL) has been used to successfully train agents for many tasks, but generalizing to a different task - or even unseen examples of the same task - remains difficult. In this thesis, Deep Reinforcement Learning (DRL) is combined with Graph Neural Networks (GNNs) and domain knowledge, with the aim of improving the generalization capabilities of RL-agents.
In classical DRL setups, Convolutional Neural Networks (CNNs) and Multilayer Perceptrons (MLPs) are often applied as neural network architectures for an agent’s policy and/or value network. In this thesis, however, GNNs are used to represent the policy and value network of an agent, which allows for the application of relational inductive biases that are more domain-specific than those of MLPs and CNNs. Observations received by the agent from a simple navigation task - which requires some relational reasoning - are encoded as graphs, consisting of entities and relations between them, which are based on domain knowledge. These graphs are then used as structured input for the GNN-based architecture of the agent. This approach is inspired by human relational reasoning, which is argued to be an important factor in human generalization capabilities.
Several GNN-based architectures are proposed and compared, from which two main architectures are distilled: R-GCN-domain and R-GCN-GAN. In the R-GCN-domain architecture, the graph encoding of observations is based on domain knowledge, whereas in R-GCN-GAN we aim to combine the relational encoding of a CNN with additional, learned relations, allowing for an end-to-end solution that does not require domain knowledge. Sample efficiency and both in- and out-of-distribution generalization performance of our architectures are tested on a new grid world environment called ’Key-Corridors’. We find that adding domain-specific relational inductive biases with the R-GCNdomain architecture significantly improves sample efficiency and out-of-distribution generalization, when compared to MLPs and CNNs. However, we did not succeed in learning these domain-specific relational inductive biases with R-GCN-GAN, which does not manage to significantly outperform a CNN. Overall, the results indicate that applying relational reasoning in RL - through the use of GNNs and domain knowledge - can be an important tool for improving sample efficiency and generalization performance.

Files

Thesis_Erik_Vester_Final.pdf

(.pdf | 3.68 Mb)

Reinforcement learning with domain-­specific relational inductive biases

Using Graph Neural Networks and domain knowledge

Abstract

Files

Reinforcement learning with domain-specific relational inductive biases