BreachT5 Ensembling CodeT5+ Models for Multi-Label Vulnerability Detection in Smart Contracts
T.L. Nguyen (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Annibale Panichella – Mentor (TU Delft - Software Engineering)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Detecting vulnerabilities in smart contracts is critical due to their immutability and the billions of dollars they secure. Industrial tools like Slither rely on hardcoded rules, often missing rare bugs or producing excessive false positives. Recent work with large language models (LLMs) such as GPT-5 has been applied to this task, but these models favor precision while failing to recall many true issues, especially in multi-label settings.
We first fine-tune a 220M CodeT5+ model on over 67,000 real-world Ethereum contracts to establish a per-class detectability baseline, revealing which SWC vulnerabilities are intrinsically easier or harder to detect. We then study scaling effects, showing that the 770M variant improves majority-class precision but loses rare-class sensitivity.
To reconcile this trade-off, we propose BreachT5, a soft-voting ensemble of both scales with tuned thresholds to balance recall and precision. BreachT5 achieves 0.556 Macro-F1 and 0.612 Micro-F1, outperforming standalone models, Slither, and GPT-5 on multi-label vulnerability detection in smart contract security.