Action Selection Policies for Walking Monte Carlo Tree Search

Master thesis (2018)

Authors

R.A.N. Starre Electrical Engineering, Mathematics and Computer Science

Contributors

M. Loog (mentor)

M.J.T. Reinders (graduation committee member)

Maurício Aniche (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

To reference this document use:

http://resolver.tudelft.nl/uuid:3947ef53-eab3-46a2-9efc-fff985cd96c9

More Info

expand_more

Published Date

29-08-2018

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Recent Reinforcement Learning methods have combined function approximation and Monte Carlo Tree Search and are able to learn by self-play up to a very high level in several games such as Go and Hex. One aspect in this combination
that has not had a lot of attention is the action selection policy during self-play, which could influence the efficiency of learning in the studied games. Inspired by the recent methods we propose a sample based planning method that uses
Monte Carlo tree search in a manner akin to self-play. Using this method we explore a variety of action selection policies based on the statistics from obtained with Monte Carlo Tree Search. We found that the action selection policies, combined with a parameter controlling the amount of exploration, had an effect on the speed of learning. The results suggest that methods using self-play to learn about an environment should consider the action selection policy to improve performance and learning efficiency. Since our method was able to learn faster than standard Monte Carlo Tree Search, our proposed method in itself is interesting to study further.

Files

Thesis_RolfStarre.pdf

(.pdf | 2.41 Mb)