Print Email Facebook Twitter Action Selection Policies for Walking Monte Carlo Tree Search Title Action Selection Policies for Walking Monte Carlo Tree Search Author Starre, Rolf (TU Delft Electrical Engineering, Mathematics and Computer Science) Contributor Loog, Marco (mentor) Reinders, Marcel (graduation committee) Finavaro Aniche, Mauricio (graduation committee) Degree granting institution Delft University of Technology Date 2018-08-29 Abstract Recent Reinforcement Learning methods have combined function approximation and Monte Carlo Tree Search and are able to learn by self-play up to a very high level in several games such as Go and Hex. One aspect in this combinationthat has not had a lot of attention is the action selection policy during self-play, which could influence the efficiency of learning in the studied games. Inspired by the recent methods we propose a sample based planning method that usesMonte Carlo tree search in a manner akin to self-play. Using this method we explore a variety of action selection policies based on the statistics from obtained with Monte Carlo Tree Search. We found that the action selection policies, combined with a parameter controlling the amount of exploration, had an effect on the speed of learning. The results suggest that methods using self-play to learn about an environment should consider the action selection policy to improve performance and learning efficiency. Since our method was able to learn faster than standard Monte Carlo Tree Search, our proposed method in itself is interesting to study further. Subject Monte Carlo Tree SearchReinforcement LearningExplorationAction selection policies To reference this document use: http://resolver.tudelft.nl/uuid:3947ef53-eab3-46a2-9efc-fff985cd96c9 Part of collection Student theses Document type master thesis Rights © 2018 Rolf Starre Files PDF Thesis_RolfStarre.pdf 2.41 MB Close viewer /islandora/object/uuid:3947ef53-eab3-46a2-9efc-fff985cd96c9/datastream/OBJ/view