BreachT5 Ensembling CodeT5+ Models for Multi-Label Vulnerability Detection in Smart Contracts

Master Thesis (2025)
Author(s)

T.L. Nguyen (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Annibale Panichella – Mentor (TU Delft - Software Engineering)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
25-08-2025
Awarding Institution
Delft University of Technology
Programme
['Computer Science']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Detecting vulnerabilities in smart contracts is critical due to their immutability and the billions of dollars they secure. Industrial tools like Slither rely on hardcoded rules, often missing rare bugs or producing excessive false positives. Recent work with large language models (LLMs) such as GPT-5 has been applied to this task, but these models favor precision while failing to recall many true issues, especially in multi-label settings.

We first fine-tune a 220M CodeT5+ model on over 67,000 real-world Ethereum contracts to establish a per-class detectability baseline, revealing which SWC vulnerabilities are intrinsically easier or harder to detect. We then study scaling effects, showing that the 770M variant improves majority-class precision but loses rare-class sensitivity.

To reconcile this trade-off, we propose BreachT5, a soft-voting ensemble of both scales with tuned thresholds to balance recall and precision. BreachT5 achieves 0.556 Macro-F1 and 0.612 Micro-F1, outperforming standalone models, Slither, and GPT-5 on multi-label vulnerability detection in smart contract security.

Files

License info not available