Print Email Facebook Twitter General Tree Evaluation for AlphaZero Title General Tree Evaluation for AlphaZero Author Jaldevik, Albin (TU Delft Electrical Engineering, Mathematics and Computer Science) Contributor Bohmer, Wendelin (mentor) Yorke-Smith, N. (graduation committee) Degree granting institution Delft University of Technology Programme Computer Science | Artificial Intelligence Date 2024-06-24 Abstract Over the last decade, there have been significant advances in model-based deep reinforcement learning. One of the most successful such algorithms is AlphaZero which combines Monte Carlo Tree Search with deep learning. AlphaZero and its successors commonly describe a unified framework for tree construction and acting. For instance, build the tree with PUCT and act according to visitation counts. Policies based on visitation counts inherently make assumptions about the tree construction. This is problematic since it constrains the construction algorithm. For example, breadth-first tree construction yields a uniform visitation policy. To address this, we investigate the goals when extracting policies from decision trees and propose novel construction decoupled policies. Furthermore, we use these to modify how decision nodes are evaluated and utilize this during tree construction. We support the claim that our novel policies can benefit AlphaZero with theoretical analysis and empirical evidence. Our results on classical Gym environments show that the benefits are especially prominent for limited simulation budgets. The code is available through GitHub. Subject AlphaZeroMonte Carlo Tree SearchDeep Reinforcement LearningSequential decision makingModel-Based Reinforcement LearningArtifical IntelligenceMachine learning To reference this document use: http://resolver.tudelft.nl/uuid:5d5fd035-eed6-4176-85d3-f31deecb6133 Bibliographical note https://github.com/albinjal/GeneralAlphaZero GitHub Part of collection Student theses Document type master thesis Rights © 2024 Albin Jaldevik Files PDF GeneralAlphaZero.pdf 1.43 MB Close viewer /islandora/object/uuid:5d5fd035-eed6-4176-85d3-f31deecb6133/datastream/OBJ/view