MuZero is a state-of-the-art reinforcement learning algorithm developed by DeepMind. This artificial intelligence program achieves superhuman performance in complex domains, the most noteworthy being popular board games and Atari games. Reinforcement learning agents, like MuZero,
...
MuZero is a state-of-the-art reinforcement learning algorithm developed by DeepMind. This artificial intelligence program achieves superhuman performance in complex domains, the most noteworthy being popular board games and Atari games. Reinforcement learning agents, like MuZero, depend critically on effective exploration to discover optimal decision-making strategies. In MuZero, this is done with a standard exploration mechanism based on Monte Carlo Tree Search visit counts and a temperature-controlled softmax action selection. This thesis investigates the potential for enhancing MuZero's performance, sample efficiency, and learning trajectory by systematically evaluating alternative exploration strategies. Our research explores several modifications to the action selection process. This involves varying the scheduling of temperature for softmax selection, an epsilon-greedy action selection and using Thompson Sampling with an ensemble of models for the exploration step. These methods provide a robust approach to balancing exploration and exploitation, and have already been applied to other programs similarly. This paper contributes a comprehensive empirical comparison of these strategies, providing insights into their practical implications for optimising MuZero and advancing the understanding of exploration in complex model-based RL agents.