Generalized Optimistic Q-Learning with Provable Efficiency

None, None; None, None

Generalized Optimistic Q-Learning with Provable Efficiency

Conference Paper (2020)

Author(s)

G. Neustroev (TU Delft - Algorithmics)

M.M. De Weerdt (TU Delft - Algorithmics)

Research Group

Algorithmics

Copyright

Reinforcement learning Model-free learning Sample efficiency

To reference this document use:

https://resolver.tudelft.nl/uuid:6ea9bc9a-b697-40fc-8c7f-2c04947b74ce

More Info

expand_more

Publication Year

2020

Language

English

Copyright

Abstract

Reinforcement learning (RL), like any on-line learning method, inevitably faces the exploration-exploitation dilemma. When a learning algorithm requires as few data samples as possible, it is called sample efficient. The design of sample-efficient algorithms is an important area of research. Interestingly, all currently known provably efficient model-free RL algorithms utilize the same well-known principle of optimism in the face of uncertainty. We unite these existing algorithms into a single general model-free optimistic RL framework. We show how this facilitates the design of new optimistic model-free RL algorithms by simplifying the analysis of their efficiency. Finally, we propose one such new algorithm and demonstrate its performance in an experimental study.

Files

P913.pdf

(pdf | 1.72 Mb)

License info not available