Multi-Model Routing for Energy-Efficient LLM Code Generation
J.M. Chan (TU Delft - Electrical Engineering, Mathematics and Computer Science)
E. Barba Roque – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)
L. Miranda da Cruz – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)
A. van Deursen – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)
J. Yang – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
The introduction of large language models (LLMs) has transformed the way software is written. With the help of LLM powered code generation the productivity of software engineers has increased all over the world. However, these models are also computationally expensive. The ubiquitous use of these models has raised significant sustainability concerns.
LLM routing aims to reduce the usage of more complex models by routing easier tasks to smaller models. However, existing research on routing primarily focuses on monetary savings and the potential for routing from a sustainability perspective has yet to be explored.
In this thesis we propose an energy-aware LLM routing framework to measure, train and evaluate various routers. We implement our framework and conduct experiments to quantify the energy efficiency of routing and to examine the trade-offs between accuracy and energy consumption. Furthermore, we analyze the overhead introduced by the various routing components. Our results show that routing can reduce energy consumption by up to 15.3\% on the HumanEval and MBPP dataset with minimal overhead when compared to a interpolated baseline. However, overall energy savings were found to decrease significantly as we aim for accuracy targets near the stronger model. These findings show that LLM routing is a viable strategy to reduce energy consumption of LLM code generation in scenarios where achieving maximum performance is not crucial.