Generalized Optimistic Q-Learning with Provable Efficiency

Conference Paper (2020)
Author(s)

G. Neustroev (TU Delft - Algorithmics)

M.M. De Weerdt (TU Delft - Algorithmics)

Research Group
Algorithmics
Copyright
© 2020 G. Neustroev, M.M. de Weerdt
More Info
expand_more
Publication Year
2020
Language
English
Copyright
© 2020 G. Neustroev, M.M. de Weerdt
Related content
Research Group
Algorithmics
Pages (from-to)
913-921
ISBN (electronic)
978-1-4503-7518-4
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Reinforcement learning (RL), like any on-line learning method, inevitably faces the exploration-exploitation dilemma. When a learning algorithm requires as few data samples as possible, it is called sample efficient. The design of sample-efficient algorithms is an important area of research. Interestingly, all currently known provably efficient model-free RL algorithms utilize the same well-known principle of optimism in the face of uncertainty. We unite these existing algorithms into a single general model-free optimistic RL framework. We show how this facilitates the design of new optimistic model-free RL algorithms by simplifying the analysis of their efficiency. Finally, we propose one such new algorithm and demonstrate its performance in an experimental study.

Files

P913.pdf
(pdf | 1.72 Mb)
License info not available