Maximizing Information Gain in Partially Observable Environments via Prediction Rewards

Satsangi, Yash; Lim, Sungsu; Whiteson, Shimon; Oliehoek, F.A.; White, Martha

Maximizing Information Gain in Partially Observable Environments via Prediction Rewards

Title

Maximizing Information Gain in Partially Observable Environments via Prediction Rewards

Author

Satsangi, Yash (University of Alberta)
Lim, Sungsu (University of Alberta)
Whiteson, Shimon (University of Oxford)
Oliehoek, F.A. (TU Delft Interactive Intelligence)
White, Martha (University of Alberta)

Contributor

An, Bo (editor)
El Fallah Seghrouchni, Amal (editor)
Sukthankar, Gita (editor)

Date

2020-05-09

Abstract

Information gathering in a partially observable environment can be formulated as a reinforcement learning (RL), problem where the reward depends on the agent's uncertainty. For example, the reward can be the negative entropy of the agent's belief over an unknown (or hidden) variable. Typically, the rewards of an RL agent are defined as a function of the state-action pairs and not as a function of the belief of the agent; this hinders the direct application of deep RL methods for such tasks. This paper tackles the challenge of using belief-based rewards for a deep RL agent, by offering a simple insight that maximizing any convex function of the belief of the agent can be approximated by instead maximizing a prediction reward: a reward based on prediction accuracy. In particular, we derive the exact error between negative entropy and the expected prediction reward. This insight provides theoretical motivation for several fields using prediction rewards---namely visual attention, question answering systems, and intrinsic motivation---and highlights their connection to the usually distinct fields of active perception, active sensing, and sensor placement. Based on this insight we present deep anticipatory networks (DANs), which enables an agent to take actions to reduce its uncertainty without performing explicit belief inference. We present two applications of DANs: building a sensor selection system for tracking people in a shopping mall and learning discrete models of attention on fashion MNIST and MNIST digit classification.

Subject

Information gain
Partially observability
Reinforcement learning

To reference this document use:

http://resolver.tudelft.nl/uuid:401e975f-91e1-4979-83d0-1a265ba0cd50

Publisher

International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS), Richland, SC

Embargo date

2021-06-21

ISBN

9781450375184

Source

Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2020

Event

AAMAS 2020, 2020-05-09 → 2020-05-13, Auckland, New Zealand

Series

Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS, 1548-8403, 2020-May

Bibliographical note

Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Part of collection

Institutional Repository

Document type

conference paper

Rights

Files

PDF

Satsangi20AAMAS.pdf

2.07 MB

Close viewer