An Exploratory Study Into Polarized Augment Calibration for Membership Inference in Code LLMs
R. Koohestani (TU Delft - Electrical Engineering, Mathematics and Computer Science)
M. Izadi – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)
J.B. Katzy – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)
A. Al-Kaswan – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)
R.L. Lagendijk – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Large Language Models (LLMs) for code are trained on large amounts of data that may contain copyrighted and licensed content, which motivates internal auditing methods that can test whether specific data points were included during training. In this work we conduct an exploratory evaluation of membership inference attacks (MIAs) as auditing signals for code-specialized LLMs. We compare a loss-based baseline to Polarized Augment Calibration (PAC) across three open models in the 3B--4B range (Mellum-4B, StarCoder2-3B, and SmolLM3-3B) using the Java subset of a contamination-controlled evaluation dataset. We find that PAC provides consistent improvements over the loss signal on the code models, while near-members samples are detected almost as effectively as exact members. A stratified analysis shows that attack performance varies substantially with file properties, with strongest separability on small-to-medium files and on code with higher alphanumeric content, and degradation on very large files. Motivated by the syntactic fragility of token-swap augmentation on code, we propose PAC-AST, an AST-guided augmentation scheme that generates syntactically valid neighbors. PAC-AST exhibits improved behavior on larger and syntactically complex files where token-swap PAC degrades but underperforms in smaller and alphanumeric-rich strata due in part to a reduced effective mutation magnitude. Overall, the results indicate that (i) calibration-based signals can strengthen grey-box auditing for code models, (ii) dataset and program characteristics are major drivers of measured leakage, and (iii) code-specific augmentation is a promising direction but requires controlling perturbation magnitude and neighbor quality to yield stable gains.
https://zenodo.org/records/18367988
https://doi.org/10.5281/zenodo.18367987