Distilling CodeT5 for Efficient On-Device Test-Assertion Generation
Combining response-based distillation and architectural tuning to deliver near-teacher quality on resource-constrained devices
A.V. Nicula (TU Delft - Electrical Engineering, Mathematics and Computer Science)
A. Panichella – Mentor (TU Delft - Software Engineering)
Mitchell Olsthoorn – Mentor (TU Delft - Software Engineering)
Petr Kellnhofer – Graduation committee member (TU Delft - Computer Graphics and Visualisation)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Writing clear, semantically rich test assertions remains a major bottleneck in software development. While large pre-trained models such as CodeT5 excel at synthesizing assertions, their size and latency make them impractical for on-premise or resourceconstrained workflows. In this work, we introduce a knowledgedistillation pipeline that transfers knowledge from CodeT5-base, a pre-trained encoder–decoder Transformer model based on the T5 architecture, into sub-1 GB student models tailored specifically for test-assertion generation. Our pipeline combines response-based distillation using soft labels and hard-label fine-tuning, and incorporates custom student architectures comparing pre-trained models vs random initialization, along with targeted regularization techniques. We instantiate students at various size points and conduct an empirical evaluation on standard assertion benchmarks, measuring exact-match accuracy, similarity, RAM footprint, and CPU and GPU inference latency. Our best 230 MB student retains over 80% of the teacher’s assertion-generation accuracy on exact matches and over 90% of the similarity, while having an inference time of under 3 seconds on a single consumer-grade CPU, with a 75% reduction in RAM usage. These results demonstrate that distilled code-LLMs can deliver near-teacher assertion quality under tight memory and latency constraints, paving the way for fully on-device IDE integration and low-overhead continuous-integration workflows.