E.L. Malmsten
info
Please Note
<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>
2 records found
1
Over the past decade, model-based reinforcement learning (MBRL) has become a leading approach for solving complex decision-making problems. A prominent algorithm in this domain is MuZero, which integrates Monte Carlo Tree Search (MCTS) with deep neural networks and a latent world model to predict future states and outcomes. Despite its effectiveness, MuZero is inherently limited by the sequential nature of its search-tree construction during planning. In this work, we address this limitation by introducing TransZero-Parallel, the first model capable of constructing MCTS without any sequential constraints. This method replaces MuZero’s recurrent dynamics model with a transformer-based network, enabling the computation of a sequence of latent future states in parallel. We combine this with the MVC evaluator, which allows the search tree to be built without depending on the inherently sequential visitation counts. Together with small modifications to the MCTS algorithm, this enables the parallel expansion of entire subtrees within the search tree. Experiments in MiniGrid and LunarLander environments demonstrate that this combined approach yields up to an eleven-fold reduction in wall clock time while maintaining sample efficiency. These results highlight the potential of TransZero-Parallel to improve planning performance and reduce training time in model-based RL—bringing the field closer to real-time, real-world applications. The code is available through GitHub.
Github footnote: https://github.com/emalmsten/TransZero ...
Github footnote: https://github.com/emalmsten/TransZero ...
Over the past decade, model-based reinforcement learning (MBRL) has become a leading approach for solving complex decision-making problems. A prominent algorithm in this domain is MuZero, which integrates Monte Carlo Tree Search (MCTS) with deep neural networks and a latent world model to predict future states and outcomes. Despite its effectiveness, MuZero is inherently limited by the sequential nature of its search-tree construction during planning. In this work, we address this limitation by introducing TransZero-Parallel, the first model capable of constructing MCTS without any sequential constraints. This method replaces MuZero’s recurrent dynamics model with a transformer-based network, enabling the computation of a sequence of latent future states in parallel. We combine this with the MVC evaluator, which allows the search tree to be built without depending on the inherently sequential visitation counts. Together with small modifications to the MCTS algorithm, this enables the parallel expansion of entire subtrees within the search tree. Experiments in MiniGrid and LunarLander environments demonstrate that this combined approach yields up to an eleven-fold reduction in wall clock time while maintaining sample efficiency. These results highlight the potential of TransZero-Parallel to improve planning performance and reduce training time in model-based RL—bringing the field closer to real-time, real-world applications. The code is available through GitHub.
Github footnote: https://github.com/emalmsten/TransZero
Github footnote: https://github.com/emalmsten/TransZero
The application of large language models (LLMs) for programming tasks, such as automatic code completion, has seen a significant upswing in recent years. However, due to their computational demands, they have to operate on servers. This both requires users to have a steady internet connection and raises potential privacy concerns. Therefore, this study aims to explore the feasibility of compressing LLMs for code using knowledge distillation (KD), thereby facilitating local usage of these models. Existing research has primarily focused on the efficacy of using KD to compress BERT models for language tasks. Its application to GPT models for coding tasks and the impact of implementing KD in-training, as opposed to the pre-training, remain largely unexplored. To address these gaps we adapted DistilBERT, a pre-training KD algorithm for distilling BERT models for language tasks. Our adapted model, Distil-CodeGPT, utilizes intraining KD to compress LLMs for Python code. The findings of this study suggest that a substantial reduction in model size is achievable, albeit accompanied by a compromise in predictive accuracy. Specifically, using 8 layers, instead of the original 12, resulted in a 24% reduction in disk size and a 28% speed increase, with an accompanying accuracy decrease of 11%. These results show that this approach has potential and is a solid first step toward smaller code models.
...
The application of large language models (LLMs) for programming tasks, such as automatic code completion, has seen a significant upswing in recent years. However, due to their computational demands, they have to operate on servers. This both requires users to have a steady internet connection and raises potential privacy concerns. Therefore, this study aims to explore the feasibility of compressing LLMs for code using knowledge distillation (KD), thereby facilitating local usage of these models. Existing research has primarily focused on the efficacy of using KD to compress BERT models for language tasks. Its application to GPT models for coding tasks and the impact of implementing KD in-training, as opposed to the pre-training, remain largely unexplored. To address these gaps we adapted DistilBERT, a pre-training KD algorithm for distilling BERT models for language tasks. Our adapted model, Distil-CodeGPT, utilizes intraining KD to compress LLMs for Python code. The findings of this study suggest that a substantial reduction in model size is achievable, albeit accompanied by a compromise in predictive accuracy. Specifically, using 8 layers, instead of the original 12, resulted in a 24% reduction in disk size and a 28% speed increase, with an accompanying accuracy decrease of 11%. These results show that this approach has potential and is a solid first step toward smaller code models.