Multi-Model Routing for Energy-Efficient LLM Code Generation

None, None

Multi-Model Routing for Energy-Efficient LLM Code Generation

Master Thesis (2026)

Author(s)

J.M. Chan (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

E. Barba Roque – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

L. Miranda da Cruz – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

A. van Deursen – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

J. Yang – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

Routing Green AI LLMs4Code

To reference this document use

https://resolver.tudelft.nl/uuid:c375c7d0-cd1f-4292-9646-06a2afd61da3

More Info

expand_more

Publication Year

2026

Language

English

Graduation Date

28-04-2026

Awarding Institution

Delft University of Technology

Programme

Computer Science

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

51

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The introduction of large language models (LLMs) has transformed the way software is written. With the help of LLM powered code generation the productivity of software engineers has increased all over the world. However, these models are also computationally expensive. The ubiquitous use of these models has raised significant sustainability concerns.

LLM routing aims to reduce the usage of more complex models by routing easier tasks to smaller models. However, existing research on routing primarily focuses on monetary savings and the potential for routing from a sustainability perspective has yet to be explored.

In this thesis we propose an energy-aware LLM routing framework to measure, train and evaluate various routers. We implement our framework and conduct experiments to quantify the energy efficiency of routing and to examine the trade-offs between accuracy and energy consumption. Furthermore, we analyze the overhead introduced by the various routing components. Our results show that routing can reduce energy consumption by up to 15.3\% on the HumanEval and MBPP dataset with minimal overhead when compared to a interpolated baseline. However, overall energy savings were found to decrease significantly as we aim for accuracy targets near the stronger model. These findings show that LLM routing is a viable strategy to reduce energy consumption of LLM code generation in scenarios where achieving maximum performance is not crucial.

Files

MasterThesisMichael.pdf

(pdf | 1.88 Mb)

License info not available