Learning What to Attend to

None, None

Learning What to Attend to

Using bisimulation metrics to explore and improve upon what a deep reinforcement learning agent learns

Master Thesis (2020)

Author(s)

N. Albers (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Frans A Oliehoek – Mentor (TU Delft - Interactive Intelligence)

M. Suau – Mentor (TU Delft - Interactive Intelligence)

Matthijs TJ Spaan – Graduation committee member (TU Delft - Algorithmics)

Willem-Paul Brinkman – Graduation committee member (TU Delft - Interactive Intelligence)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Deep Reinforcement Learning Markovianity Generalization Representation Learning Bisimulation Metrics Auxiliary Loss

To reference this document use:

https://resolver.tudelft.nl/uuid:2945dcc8-e7b9-4536-b9e7-074cfe86d3f9

More Info

expand_more

Publication Year

2020

Language

English

Copyright

Graduation Date

12-08-2020

Awarding Institution

Delft University of Technology

Programme

['Computer Science | Data Science and Technology']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

We analyze the internal representations that deep Reinforcement Learning (RL) agents form of their environments and whether these representations correspond to what such agents should ideally learn. The purpose of this comparison is both a better understanding of why certain algorithms or network architectures perform better than others and the development of methods that specifically target discrepancies between what is and what should be learned. The concept of ideal representation we utilize is based on stochastic bisimulation and bisimulation metrics, which are measures of whether and to which degree states are behaviorally similar, respectively. Learning an internal representation in which states are equivalent if and only if they are bisimilar and in which distances between non-equivalent states are proportional to how behaviorally similar the states are has several desirable theoretical properties. Yet, we show empirically that the extent to which such a representation is learned in practice depends on several factors and that a precise such representation is not created in any case. We further provide experimental results that suggest that learning a representation that is close to this target internal state representation during training may improve upon the learning speed and consistency, and doing so by the end of training upon generalization.

Files

Nele_Albers_Master_Thesis_Lear... (pdf)

(pdf | 180 Mb)

License info not available