Print Email Facebook Twitter Interval Q-Learning: Balancing Deep and Wide Exploration Title Interval Q-Learning: Balancing Deep and Wide Exploration Author Neustroev, G. (TU Delft Algorithmics) Ponnambalam, C.T. (TU Delft Algorithmics) de Weerdt, M.M. (TU Delft Algorithmics) Spaan, M.T.J. (TU Delft Algorithmics) Date 2020 Abstract Reinforcement learning requires exploration, leading to repeated execution of sub-optimal actions. Naive exploration techniques address this problem by changing gradually from exploration to exploitation. This approach employs a wide search resulting in exhaustive exploration and low sample-efficiency. More advanced search methods explore optimistically based on an upper bound estimate of expected rewards. These methods employ deep search, aiming to reach states not previously visited. Another deep search strategy is found in action-elimination methods, which aim to discover and eliminate sub-optimal actions. Despite the effectiveness of advanced deep search strategies, some problems are better suited to naive exploration. We devise a new method, called Interval Q-Learning, that finds a balance between wide and deep search. It assigns a small probability to taking sub-optimal actions and combines both greedy and optimistic exploration. This allows for fast convergence to a near-optimal policy, and then exploration around it. We demonstrate the performance of tabular and deep Q-network versions of Interval Q-Learning, showing that it offers convergence speed-up both in problems that favor wide exploration methods and those that favor deep search strategies. To reference this document use: http://resolver.tudelft.nl/uuid:66399260-9551-46f5-92a8-bed977898f99 Source Adaptive and Learning Agents Workshop Event Adaptive and Learning Agents Workshop 2020, 2020-05-09 → 2020-05-10 Bibliographical note Virtual/online event due to COVID-19 Part of collection Institutional Repository Document type conference paper Rights © 2020 G. Neustroev, C.T. Ponnambalam, M.M. de Weerdt, M.T.J. Spaan Files PDF ALA2020_paper_38_1.pdf 665.67 KB Close viewer /islandora/object/uuid:66399260-9551-46f5-92a8-bed977898f99/datastream/OBJ/view