AlphaZero and its successors employ learned value and policy functions to enable more efficient and effective planning at deployment. A standard assumption is that the agent will be deployed in the same environment where these estimators were trained; changes to the environment w
...
AlphaZero and its successors employ learned value and policy functions to enable more efficient and effective planning at deployment. A standard assumption is that the agent will be deployed in the same environment where these estimators were trained; changes to the environment would otherwise violate their expectations and could result in suboptimal decisions. In this work, we investigate how environment changes affect the usability of the learned estimators and develop criteria that can quickly detect and localize such changes. Moreover, we develop novel planning methods that leverage these principles as well as further modifications of standard Monte Carlo planning techniques. These methods demonstrate superior performance under several tested environment configurations. The main assumptions and limitations of our approaches are also discussed, providing a foundation for future research to broaden their applicability. The code is available on GitHub.