Evaluating SURP MIA performance on code samples

None, None

Evaluating SURP MIA performance on code samples

Bachelor Thesis (2026)

Author(s)

Ísak Jónsson (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

M. Izadi – Mentor (TU Delft - Software Engineering)

A. Al-Kaswan – Mentor (TU Delft - Software Engineering)

J.B. Katzy – Mentor (TU Delft - Software Engineering)

R.L. Lagendijk – Graduation committee member (TU Delft - Cyber Security)

MIA Distribution Shift Membership Inference Attacks MIAs Code LLMs Code LLM

To reference this document use

https://resolver.tudelft.nl/uuid:90d6d77d-2e39-44d1-9042-b5d3fde38967

More Info

expand_more

Publication Year

2026

Language

English

Graduation Date

28-01-2026

Awarding Institution

Project

CSE3000 Research Project

Programme

Computer Science and Engineering

Downloads counter

33

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Code language models are pretrained on massive datasets scraped from public repositories which are rarely disclosed. Membership Inference Attacks (MIAs) aim to predict whether specific samples were used in training but attack performance is contested. Previous work has shown that many attacks on LLMs perform randomly when evaluated on independent and identically distributed (i.i.d.) members and non-members. We consider three MIAs: LOSS, MinK\%, and SURP (where each attack extends the last with additional filtering of tokens considered for the membership signal), on StarCoder2-3B and Mellum-4B using the AISE MIA dataset, which contains 100,000 Java files with verified membership labels. We address a gap in the evaluation of these attacks on i.i.d. code samples and in the detailed comparison of SURP and MinK\%. A bag-of-words (BoW) classifier is used to measure distribution shift with an expected ROC-AUC of 0.5 under i.i.d. conditions. We achieve a ROC-AUC of 0.91 confirming substantial distribution shift. We apply two debiasing procedures to construct evaluation subsets: Taking samples close to the BoW decision boundary reduces BoW ROC-AUC performance to 0.66, while selecting BoW misclassified samples fails to reduce shift. After debiasing, all attacks perform at or below the bag-of-words baseline, with ROC-AUC between 0.55 and 0.63 and TPR at 5\% FPR between 0.05 and 0.16; suggesting random performance under strict i.i.d conditions. Hyperparameter ablation reveals that SURP collapses to MinK\% under optimization: optimal configurations disable SURP filtering or have classification agreement exceeding 94\% excluding one outlier. These results extend prior natural language findings to code: reference-free attacks exploit distributional differences rather than detecting membership.

Files

BRP_Poisoned_Chalice_Isak-25.p... (pdf)

(pdf | 0.455 Mb)

License info not available