An Exploratory Study Into Polarized Augment Calibration for Membership Inference in Code LLMs

Bachelor Thesis (2026)
Author(s)

R. Koohestani (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

M. Izadi – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

J.B. Katzy – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

A. Al-Kaswan – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

R.L. Lagendijk – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2026
Language
English
Graduation Date
28-01-2026
Awarding Institution
Delft University of Technology
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Faculty
Electrical Engineering, Mathematics and Computer Science
Downloads counter
67
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Large Language Models (LLMs) for code are trained on large amounts of data that may contain copyrighted and licensed content, which motivates internal auditing methods that can test whether specific data points were included during training. In this work we conduct an exploratory evaluation of membership inference attacks (MIAs) as auditing signals for code-specialized LLMs. We compare a loss-based baseline to Polarized Augment Calibration (PAC) across three open models in the 3B--4B range (Mellum-4B, StarCoder2-3B, and SmolLM3-3B) using the Java subset of a contamination-controlled evaluation dataset. We find that PAC provides consistent improvements over the loss signal on the code models, while near-members samples are detected almost as effectively as exact members. A stratified analysis shows that attack performance varies substantially with file properties, with strongest separability on small-to-medium files and on code with higher alphanumeric content, and degradation on very large files. Motivated by the syntactic fragility of token-swap augmentation on code, we propose PAC-AST, an AST-guided augmentation scheme that generates syntactically valid neighbors. PAC-AST exhibits improved behavior on larger and syntactically complex files where token-swap PAC degrades but underperforms in smaller and alphanumeric-rich strata due in part to a reduced effective mutation magnitude. Overall, the results indicate that (i) calibration-based signals can strengthen grey-box auditing for code models, (ii) dataset and program characteristics are major drivers of measured leakage, and (iii) code-specific augmentation is a promising direction but requires controlling perturbation magnitude and neighbor quality to yield stable gains.

https://zenodo.org/records/18367988
https://doi.org/10.5281/zenodo.18367987

Files

License info not available