Print Email Facebook Twitter Learning What to Attend to Title Learning What to Attend to: Using bisimulation metrics to explore and improve upon what a deep reinforcement learning agent learns Author Albers, Nele (TU Delft Electrical Engineering, Mathematics and Computer Science; TU Delft Interactive Intelligence) Contributor Oliehoek, F.A. (mentor) Suau de Castro, M. (mentor) Spaan, M.T.J. (graduation committee) Brinkman, W.P. (graduation committee) Degree granting institution Delft University of Technology Programme Computer Science | Data Science and Technology Date 2020-08-12 Abstract We analyze the internal representations that deep Reinforcement Learning (RL) agents form of their environments and whether these representations correspond to what such agents should ideally learn. The purpose of this comparison is both a better understanding of why certain algorithms or network architectures perform better than others and the development of methods that specifically target discrepancies between what is and what should be learned. The concept of ideal representation we utilize is based on stochastic bisimulation and bisimulation metrics, which are measures of whether and to which degree states are behaviorally similar, respectively. Learning an internal representation in which states are equivalent if and only if they are bisimilar and in which distances between non-equivalent states are proportional to how behaviorally similar the states are has several desirable theoretical properties. Yet, we show empirically that the extent to which such a representation is learned in practice depends on several factors and that a precise such representation is not created in any case. We further provide experimental results that suggest that learning a representation that is close to this target internal state representation during training may improve upon the learning speed and consistency, and doing so by the end of training upon generalization. Subject Representation LearningBisimulation MetricsGeneralizationDeep Reinforcement LearningAuxiliary LossMarkovianity To reference this document use: http://resolver.tudelft.nl/uuid:2945dcc8-e7b9-4536-b9e7-074cfe86d3f9 Part of collection Student theses Document type master thesis Rights © 2020 Nele Albers Files PDF Nele_Albers_Master_Thesis ... end_to.pdf 179.54 MB Close viewer /islandora/object/uuid:2945dcc8-e7b9-4536-b9e7-074cfe86d3f9/datastream/OBJ/view