Change of Plans! Adaptive AlphaZero Planning Methods for Novel Test Environments

None, None

Change of Plans! Adaptive AlphaZero Planning Methods for Novel Test Environments

Master Thesis (2025)

Author(s)

I. Tamassia (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

J.W. Böhmer – Mentor (TU Delft - Sequential Decision Making)

A. Lukina – Graduation committee member (TU Delft - Algorithmics)

Faculty

Electrical Engineering, Mathematics and Computer Science

Reinforcement Learning Deep Learning Monte Carlo Tree Search AlphaZero Sequential Decison Making Model-Based Reinforcement Learning

To reference this document use:

https://resolver.tudelft.nl/uuid:39385de2-959d-4fb8-984d-aecdc7046729

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

25-06-2025

Awarding Institution

Delft University of Technology

Programme

['Computer Science | Artificial Intelligence']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

AlphaZero and its successors employ learned value and policy functions to enable more efficient and effective planning at deployment. A standard assumption is that the agent will be deployed in the same environment where these estimators were trained; changes to the environment would otherwise violate their expectations and could result in suboptimal decisions. In this work, we investigate how environment changes affect the usability of the learned estimators and develop criteria that can quickly detect and localize such changes. Moreover, we develop novel planning methods that leverage these principles as well as further modifications of standard Monte Carlo planning techniques. These methods demonstrate superior performance under several tested environment configurations. The main assumptions and limitations of our approaches are also discussed, providing a foundation for future research to broaden their applicability. The code is available on GitHub.

Files

MSc_Thesis_Change_of_Plans-Fin... (pdf)

(pdf | 7.31 Mb)

License info not available