The Impact of Subsampling on Differentially Private Fraud Detection

Bachelor Thesis (2026)
Author(s)

S. van Wassenaar (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Z. Erkin – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

N.M. Gürel – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2026
Language
English
Graduation Date
23-06-2026
Awarding Institution
Delft University of Technology
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Faculty
Electrical Engineering, Mathematics and Computer Science
Downloads counter
10
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Fraud detection is a critical task, but detecting fraud can be challenging due to class
imbalance. Furthermore, the availability of the data is also limited, because the data
is very sensitive and cannot be shared between financial institutions due to privacy
regulations. One way to address this is by applying differential privacy. Differential
privacy is a mechanism that applies controlled noise to the data to achieve a certain
privacy guarantee. However the added noise may also affects the utility of the data.

Privacy amplification techniques, such as subsampling, have been introduced to
increase the privacy guarantee without directly adding additional noise. This paper
investigates how privacy amplification by subsampling affects the privacy-utility tradeoff in differentially private fraud detection.

To answer this question, an experiment is performed in which logistic regression
models are trained using different subsampling rates and privacy budgets.

The results show that subsampling can improve the performance of a model in
highly private settings. However, this improvement is primarily in distinguishing between fraudulent and legitimate transactions, rather than detecting more fraudulent
transactions. Therefore, whether subsampling is effective depends on the application
and the costs of false positives and false negatives.

Files

Research_Paper_Storm.pdf
(pdf | 0.571 Mb)
License info not available