Distilling CodeT5 for Efficient On-Device Test-Assertion Generation

None, None

Distilling CodeT5 for Efficient On-Device Test-Assertion Generation

Combining response-based distillation and architectural tuning to deliver near-teacher quality on resource-constrained devices

Bachelor Thesis (2025)

Author(s)

A.V. Nicula (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

A. Panichella – Mentor (TU Delft - Software Engineering)

Mitchell Olsthoorn – Mentor (TU Delft - Software Engineering)

Petr Kellnhofer – Graduation committee member (TU Delft - Computer Graphics and Visualisation)

Faculty

Electrical Engineering, Mathematics and Computer Science

Knowledge Distillation Software Testing Large language models CodeT5 Model Compression Test-Assertion Generation

To reference this document use:

https://resolver.tudelft.nl/uuid:d6699f90-cbcc-431b-be72-9725f221af16

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

24-06-2025

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Writing clear, semantically rich test assertions remains a major bottleneck in software development. While large pre-trained models such as CodeT5 excel at synthesizing assertions, their size and latency make them impractical for on-premise or resourceconstrained workflows. In this work, we introduce a knowledgedistillation pipeline that transfers knowledge from CodeT5-base, a pre-trained encoder–decoder Transformer model based on the T5 architecture, into sub-1 GB student models tailored specifically for test-assertion generation. Our pipeline combines response-based distillation using soft labels and hard-label fine-tuning, and incorporates custom student architectures comparing pre-trained models vs random initialization, along with targeted regularization techniques. We instantiate students at various size points and conduct an empirical evaluation on standard assertion benchmarks, measuring exact-match accuracy, similarity, RAM footprint, and CPU and GPU inference latency. Our best 230 MB student retains over 80% of the teacher’s assertion-generation accuracy on exact matches and over 90% of the similarity, while having an inference time of under 3 seconds on a single consumer-grade CPU, with a 75% reduction in RAM usage. These results demonstrate that distilled code-LLMs can deliver near-teacher assertion quality under tight memory and latency constraints, paving the way for fully on-device IDE integration and low-overhead continuous-integration workflows.

Files

Final_Paper.pdf

(pdf | 1.12 Mb)

License info not available