Change of Plans! Adaptive AlphaZero Planning Methods for Novel Test Environments
I. Tamassia (TU Delft - Electrical Engineering, Mathematics and Computer Science)
J.W. Böhmer – Mentor (TU Delft - Sequential Decision Making)
A. Lukina – Graduation committee member (TU Delft - Algorithmics)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
AlphaZero and its successors employ learned value and policy functions to enable more efficient and effective planning at deployment. A standard assumption is that the agent will be deployed in the same environment where these estimators were trained; changes to the environment would otherwise violate their expectations and could result in suboptimal decisions. In this work, we investigate how environment changes affect the usability of the learned estimators and develop criteria that can quickly detect and localize such changes. Moreover, we develop novel planning methods that leverage these principles as well as further modifications of standard Monte Carlo planning techniques. These methods demonstrate superior performance under several tested environment configurations. The main assumptions and limitations of our approaches are also discussed, providing a foundation for future research to broaden their applicability. The code is available on GitHub.