Action Selection Policies for Walking Monte Carlo Tree Search

More Info
expand_more

Abstract

Recent Reinforcement Learning methods have combined function approximation and Monte Carlo Tree Search and are able to learn by self-play up to a very high level in several games such as Go and Hex. One aspect in this combination
that has not had a lot of attention is the action selection policy during self-play, which could influence the efficiency of learning in the studied games. Inspired by the recent methods we propose a sample based planning method that uses
Monte Carlo tree search in a manner akin to self-play. Using this method we explore a variety of action selection policies based on the statistics from obtained with Monte Carlo Tree Search. We found that the action selection policies, combined with a parameter controlling the amount of exploration, had an effect on the speed of learning. The results suggest that methods using self-play to learn about an environment should consider the action selection policy to improve performance and learning efficiency. Since our method was able to learn faster than standard Monte Carlo Tree Search, our proposed method in itself is interesting to study further.