Writing clear, semantically rich test assertions remains a major bottleneck in software development. While large pre-trained models such as CodeT5 excel at synthesizing assertions, their size and latency make them impractical for on-premise or resourceconstrained workflows. In th
...
Writing clear, semantically rich test assertions remains a major bottleneck in software development. While large pre-trained models such as CodeT5 excel at synthesizing assertions, their size and latency make them impractical for on-premise or resourceconstrained workflows. In this work, we introduce a knowledgedistillation pipeline that transfers knowledge from CodeT5-base, a pre-trained encoder–decoder Transformer model based on the T5 architecture, into sub-1 GB student models tailored specifically for test-assertion generation. Our pipeline combines response-based distillation using soft labels and hard-label fine-tuning, and incorporates custom student architectures comparing pre-trained models vs random initialization, along with targeted regularization techniques. We instantiate students at various size points and conduct an empirical evaluation on standard assertion benchmarks, measuring exact-match accuracy, similarity, RAM footprint, and CPU and GPU inference latency. Our best 230 MB student retains over 80% of the teacher’s assertion-generation accuracy on exact matches and over 90% of the similarity, while having an inference time of under 3 seconds on a single consumer-grade CPU, with a 75% reduction in RAM usage. These results demonstrate that distilled code-LLMs can deliver near-teacher assertion quality under tight memory and latency constraints, paving the way for fully on-device IDE integration and low-overhead continuous-integration workflows.