The Impact of Subsampling on Differentially Private Fraud Detection

None, None

The Impact of Subsampling on Differentially Private Fraud Detection

Bachelor Thesis (2026)

Author(s)

S. van Wassenaar (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Z. Erkin – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

N.M. Gürel – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

Differential Privacy Privacy Amplification

To reference this document use

https://resolver.tudelft.nl/uuid:618d45a7-8560-44b6-be5f-2ce9f49fd51b

More Info

expand_more

Publication Year

2026

Language

English

Graduation Date

23-06-2026

Awarding Institution

Delft University of Technology

Project

CSE3000 Research Project

Programme

Computer Science and Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

10

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Fraud detection is a critical task, but detecting fraud can be challenging due to class
imbalance. Furthermore, the availability of the data is also limited, because the data
is very sensitive and cannot be shared between financial institutions due to privacy
regulations. One way to address this is by applying differential privacy. Differential
privacy is a mechanism that applies controlled noise to the data to achieve a certain
privacy guarantee. However the added noise may also affects the utility of the data.

Privacy amplification techniques, such as subsampling, have been introduced to
increase the privacy guarantee without directly adding additional noise. This paper
investigates how privacy amplification by subsampling affects the privacy-utility tradeoff in differentially private fraud detection.

To answer this question, an experiment is performed in which logistic regression
models are trained using different subsampling rates and privacy budgets.

The results show that subsampling can improve the performance of a model in
highly private settings. However, this improvement is primarily in distinguishing between fraudulent and legitimate transactions, rather than detecting more fraudulent
transactions. Therefore, whether subsampling is effective depends on the application
and the costs of false positives and false negatives.

Files

Research_Paper_Storm.pdf

(pdf | 0.571 Mb)

License info not available