Print Email Facebook Twitter Generalized Optimistic Q-Learning with Provable Efficiency Title Generalized Optimistic Q-Learning with Provable Efficiency Author Neustroev, G. (TU Delft Algorithmics) de Weerdt, M.M. (TU Delft Algorithmics) Contributor An, Bo (editor) El Fallah Seghrouchni, Amal (editor) Sukthankar, Gita (editor) Date 2020-05 Abstract Reinforcement learning (RL), like any on-line learning method, inevitably faces the exploration-exploitation dilemma. When a learning algorithm requires as few data samples as possible, it is called sample efficient. The design of sample-efficient algorithms is an important area of research. Interestingly, all currently known provably efficient model-free RL algorithms utilize the same well-known principle of optimism in the face of uncertainty. We unite these existing algorithms into a single general model-free optimistic RL framework. We show how this facilitates the design of new optimistic model-free RL algorithms by simplifying the analysis of their efficiency. Finally, we propose one such new algorithm and demonstrate its performance in an experimental study. Subject Model-free learningReinforcement learningSample efficiency To reference this document use: http://resolver.tudelft.nl/uuid:6ea9bc9a-b697-40fc-8c7f-2c04947b74ce Publisher International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS) ISBN 978-1-4503-7518-4 Source Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2020 Event AAMAS 2020, 2020-05-09 → 2020-05-13, Auckland, New Zealand Series Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS, 1548-8403, 2020-May Bibliographical note Virtual/online event due to COVID-19 Part of collection Institutional Repository Document type conference paper Rights © 2020 G. Neustroev, M.M. de Weerdt Files PDF p913.pdf 1.72 MB Close viewer /islandora/object/uuid:6ea9bc9a-b697-40fc-8c7f-2c04947b74ce/datastream/OBJ/view