Efficient exploration with Double Uncertain Value Networks

None, None; None, None; None, None

Efficient exploration with Double Uncertain Value Networks

Conference Paper (2017)

Author(s)

Thomas Moerland (TU Delft - Interactive Intelligence)

Joost Broekens (TU Delft - Interactive Intelligence)

Catholijn Jonker (TU Delft - Interactive Intelligence)

Research Group

Interactive Intelligence

To reference this document use:

https://resolver.tudelft.nl/uuid:615d6642-d375-4f61-b1aa-6d69c9160bbb

More Info

expand_more

Publication Year

2017

Language

English

Research Group

Interactive Intelligence

Pages (from-to)

1-17

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This paper studies directed exploration for reinforcement learning agents by tracking uncertainty about the value of each available action. We identify two sources of uncertainty that are relevant for exploration. The first originates from limited data (parametric uncertainty), while the second originates from the distribution of the returns (return uncertainty). We identify methods to learn these distributions with deep neural networks, where we estimate parametric uncertainty with Bayesian drop-out, while return uncertainty is propagated through the Bellman equation as a Gaussian distribution. Then, we identify that both can be jointly estimated in one network, which we call the Double Uncertain Value Network. The policy is directly derived from the learned distributions based on Thompson sampling. Experimental results show that both types of uncertainty may vastly improve learning in domains with a strong exploration challenge.

Files

MoerlandBroekensJonker_Efficie... (pdf)

(pdf | 1.92 Mb)

License info not available