Generalized Optimistic Q-Learning with Provable Efficiency

Conference Paper (2020)
Author(s)

Greg Neustroev (TU Delft - Algorithmics)

Mathijs de Weerdt (TU Delft - Algorithmics)

More Info
expand_more
Publication Year
2020
Language
English
Related content
Pages (from-to)
913-921
ISBN (electronic)
978-1-4503-7518-4
Event
Downloads counter
206
Collections
Institutional Repository
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Reinforcement learning (RL), like any on-line learning method, inevitably faces the exploration-exploitation dilemma. When a learning algorithm requires as few data samples as possible, it is called sample efficient. The design of sample-efficient algorithms is an important area of research. Interestingly, all currently known provably efficient model-free RL algorithms utilize the same well-known principle of optimism in the face of uncertainty. We unite these existing algorithms into a single general model-free optimistic RL framework. We show how this facilitates the design of new optimistic model-free RL algorithms by simplifying the analysis of their efficiency. Finally, we propose one such new algorithm and demonstrate its performance in an experimental study.

Files

P913.pdf
(pdf | 1.72 Mb)
License info not available