Improving (ε,δ)-PAC Monte Carlo tree search using spherical confidence regions

More Info
expand_more

Abstract

We consider a simplified version of the Monte Carlo tree search (MCTS) problem, a problem where, given a game tree with stochastic reward, one is tasked with finding the best move from the root. This problem is well studied, and recently impressive results have been obtained. For example, in 2016, when the AlphaGo program beat the professional Go player Lee Sedol. Even though these results were spectacular, not much is known about the guaranteed accuracy of these algorithms. In this work, we review the FindTopWinner algorithm for the MCTS problem. The FindTopWinner algorithm is guaranteed to give an accurate result with high probability, as is formalized in the (ε,δ)-PAC learning framework. However, its sample complexity is not optimal. We improve FindTopWinner by introducing the Spherical Confidence Region- MCTS (SCR-MCTS) algorithm, which operates by creating a spherical confidence region on the joint values of the leaves in multiple iterations called epochs. We show that SCR-MCTS has the (ε,δ)-PAC guarantee, and we show that empirically SCR-MCTS draws fewer samples than FindTopWinner does on various benchmark trees.

Files