C. Vasilescu

info

Please Note

<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>

Bachelor thesis (1)

1 records found

The Illusion of Ability: The Poisoned Promise of LLM Performance

An Evaluation of the Min-K% Prob membership inference attack

Bachelor thesis (2026) - C. Vasilescu, M. Izadi, A. Al-Kaswan, J.B. Katzy, R.L. Lagendijk

Large Language Models are becoming increasingly popular in software engineering, yet the exact composition of their training data remains largely undisclosed. This opacity introduces risks regarding copyright infringement and benchmark contamination. In this work, we audit the susceptibility of different models (StarCoder2, Mellum, and SmolLM3) to Membership Inference Attacks on code files, specifically evaluating the Min-K% Prob method.

We find that this approach serves as an effective auditor, achieving ROC-AUC scores of up to 0.793, yet performance degrades as non-members become more similar to members. The classification is primarily driven by non-functional artifacts, such as license headers and package identifiers.

Furthermore, we investigate post-training quantization as an attack accelerator. We find that the membership signal remains robust even when weights are compressed from 32-bit to 4-bit precision, and the use of 16-bit Brain Float (BF16) format reduces inference latency by a factor of 6, establishing MKP as a practical tool for assessing membership in models' training sets. ...