BreachT5 Ensembling CodeT5+ Models for Multi-Label Vulnerability Detection in Smart Contracts

None, None

BreachT5 Ensembling CodeT5+ Models for Multi-Label Vulnerability Detection in Smart Contracts

Master Thesis (2025)

Author(s)

T.L. Nguyen (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

A. Panichella – Mentor (TU Delft - Software Engineering)

Faculty

Electrical Engineering, Mathematics and Computer Science

To reference this document use:

https://resolver.tudelft.nl/uuid:2d37458e-9a2b-4923-8293-ace7918e7caa

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

25-08-2025

Awarding Institution

Delft University of Technology

Programme

['Computer Science']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Detecting vulnerabilities in smart contracts is critical due to their immutability and the billions of dollars they secure. Industrial tools like Slither rely on hardcoded rules, often missing rare bugs or producing excessive false positives. Recent work with large language models (LLMs) such as GPT-5 has been applied to this task, but these models favor precision while failing to recall many true issues, especially in multi-label settings.

We first fine-tune a 220M CodeT5+ model on over 67,000 real-world Ethereum contracts to establish a per-class detectability baseline, revealing which SWC vulnerabilities are intrinsically easier or harder to detect. We then study scaling effects, showing that the 770M variant improves majority-class precision but loses rare-class sensitivity.

To reconcile this trade-off, we propose BreachT5, a soft-voting ensemble of both scales with tuned thresholds to balance recall and precision. BreachT5 achieves 0.556 Macro-F1 and 0.612 Micro-F1, outperforming standalone models, Slither, and GPT-5 on multi-label vulnerability detection in smart contract security.

Files

BreachT5_Ensembling_CodeT5_Mod... (pdf)

(pdf | 0.634 Mb)

License info not available