An Exploratory Study Into Polarized Augment Calibration for Membership Inference in Code LLMs

None, None

An Exploratory Study Into Polarized Augment Calibration for Membership Inference in Code LLMs

Bachelor Thesis (2026)

Author(s)

R. Koohestani (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

M. Izadi – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

J.B. Katzy – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

A. Al-Kaswan – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

R.L. Lagendijk – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

Data Governance LLMs Membership Inference Attacks

To reference this document use

https://resolver.tudelft.nl/uuid:a516cb1e-3450-4b24-b66a-271ba3331125

More Info

expand_more

Publication Year

2026

Language

English

Graduation Date

28-01-2026

Awarding Institution

Delft University of Technology

Project

CSE3000 Research Project

Programme

Computer Science and Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

67

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Large Language Models (LLMs) for code are trained on large amounts of data that may contain copyrighted and licensed content, which motivates internal auditing methods that can test whether specific data points were included during training. In this work we conduct an exploratory evaluation of membership inference attacks (MIAs) as auditing signals for code-specialized LLMs. We compare a loss-based baseline to Polarized Augment Calibration (PAC) across three open models in the 3B--4B range (Mellum-4B, StarCoder2-3B, and SmolLM3-3B) using the Java subset of a contamination-controlled evaluation dataset. We find that PAC provides consistent improvements over the loss signal on the code models, while near-members samples are detected almost as effectively as exact members. A stratified analysis shows that attack performance varies substantially with file properties, with strongest separability on small-to-medium files and on code with higher alphanumeric content, and degradation on very large files. Motivated by the syntactic fragility of token-swap augmentation on code, we propose PAC-AST, an AST-guided augmentation scheme that generates syntactically valid neighbors. PAC-AST exhibits improved behavior on larger and syntactically complex files where token-swap PAC degrades but underperforms in smaller and alphanumeric-rich strata due in part to a reduced effective mutation magnitude. Overall, the results indicate that (i) calibration-based signals can strengthen grey-box auditing for code models, (ii) dataset and program characteristics are major drivers of measured leakage, and (iii) code-specific augmentation is a promising direction but requires controlling perturbation magnitude and neighbor quality to yield stable gains.

https://zenodo.org/records/18367988
https://doi.org/10.5281/zenodo.18367987

Files

BRP_Poisoned_Chalice_Roham_fin... (pdf)

(pdf | 0.414 Mb)

License info not available