Improving (ε,δ)-PAC Monte Carlo tree search using spherical confidence regions

Master thesis (2020)

Authors

T.V. Schiet Electrical Engineering, Mathematics and Computer Science

Contributors

K.I. Aardal Discrete Mathematics and Optimization - (mentor)

W.M. Koolen Centrum Wiskunde & Informatica (CWI) (mentor)

F.A. Oliehoek Interactive Intelligence - (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:a595fb2b-216f-4085-a4b0-f34cb6af4341

Published Date

14-12-2020

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

We consider a simplified version of the Monte Carlo tree search (MCTS) problem, a problem where, given a game tree with stochastic reward, one is tasked with finding the best move from the root. This problem is well studied, and recently impressive results have been obtained. For example, in 2016, when the AlphaGo program beat the professional Go player Lee Sedol. Even though these results were spectacular, not much is known about the guaranteed accuracy of these algorithms. In this work, we review the FindTopWinner algorithm for the MCTS problem. The FindTopWinner algorithm is guaranteed to give an accurate result with high probability, as is formalized in the (ε,δ)-PAC learning framework. However, its sample complexity is not optimal. We improve FindTopWinner by introducing the Spherical Confidence Region- MCTS (SCR-MCTS) algorithm, which operates by creating a spherical confidence region on the joint values of the leaves in multiple iterations called epochs. We show that SCR-MCTS has the (ε,δ)-PAC guarantee, and we show that empirically SCR-MCTS draws fewer samples than FindTopWinner does on various benchmark trees.

Files

MSc_thesis.pdf

(.pdf | 1.02 Mb)