Creating Local LLMs for Test Assertion Generation: A Comparative Study of Knowledge Distillation from CodeT5
G. Dimitrov (TU Delft - Electrical Engineering, Mathematics and Computer Science)
A. Panichella – Mentor (TU Delft - Software Engineering)
Mitchell Olsthoorn – Mentor (TU Delft - Software Engineering)
Petr Kellnhofer – Graduation committee member (TU Delft - Computer Graphics and Visualisation)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Effective test assertions are important for software quality, but their creation is time-consuming. While Large Language Models (LLMs) show promise in automated assertion generation, their size, cost, resource demands, and need for online connection often render them impractical for widespread developer use. Knowledge Distillation (KD) offers a solution to bridge that gap by transferring capabilities from a large "teacher" LLM to smaller "student" models (SLMs). However, the majority of the ground work on KD has been focused on classification tasks and not on generative problems. This paper investigates the feasibility of a test assertion generation task using response-based Knowledge Distillation (KD) from a CodeT5-base teacher. We specifically explore the impact of three parameters on assertion quality and model efficiency - those being student model size (number of layers), pretraining initialization, and loss weighting. Our results demonstrate that distilled small student models (231 MB), particularly those initialized from pretrained checkpoints and fine-tuned with specific loss weight (α = 0.5) for the ground truth and distillation losses, can retain a significant portion of the teacher's assertion generation performance when considering the defined metrics - achieving around 83.9% of the CodeBERTScore of the teacher with just 25.9% of the size. This work provides empirical insights into creating specialized SLMs for test assertion generation, highlighting practical configurations for deployment in development environments.