Multi-Model Routing for Energy-Efficient LLM Code Generation

Master Thesis (2026)
Author(s)

J.M. Chan (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

E. Barba Roque – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

L. Miranda da Cruz – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

A. van Deursen – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

J. Yang – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2026
Language
English
Graduation Date
28-04-2026
Awarding Institution
Delft University of Technology
Programme
Computer Science
Faculty
Electrical Engineering, Mathematics and Computer Science
Downloads counter
40
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The introduction of large language models (LLMs) has transformed the way software is written. With the help of LLM powered code generation the productivity of software engineers has increased all over the world. However, these models are also computationally expensive. The ubiquitous use of these models has raised significant sustainability concerns.

LLM routing aims to reduce the usage of more complex models by routing easier tasks to smaller models. However, existing research on routing primarily focuses on monetary savings and the potential for routing from a sustainability perspective has yet to be explored.

In this thesis we propose an energy-aware LLM routing framework to measure, train and evaluate various routers. We implement our framework and conduct experiments to quantify the energy efficiency of routing and to examine the trade-offs between accuracy and energy consumption. Furthermore, we analyze the overhead introduced by the various routing components. Our results show that routing can reduce energy consumption by up to 15.3\% on the HumanEval and MBPP dataset with minimal overhead when compared to a interpolated baseline. However, overall energy savings were found to decrease significantly as we aim for accuracy targets near the stronger model. These findings show that LLM routing is a viable strategy to reduce energy consumption of LLM code generation in scenarios where achieving maximum performance is not crucial.

Files

MasterThesisMichael.pdf
(pdf | 1.88 Mb)
License info not available