Improving medical data synthesis with DP-GAN and Deep Anomaly Detection

More Info
expand_more

Abstract

Ensuring the privacy of medical data in a meaningful manner is a complex task. This domain presents a plethora of unique challenges: high stakes, vast differences between possible use cases, long-established methods that limit the number of feasible solutions, and more. Consequently, an effective approach to ensuring the privacy of medical data must be easy to adopt, offer robust privacy guarantees, and minimize the reduction in data utility.

The unique nature of medical data presents distinct challenges and also opportunities. We consider various types of correlations that significantly impact privacy guarantees. However, these correlations can also be used to train a model for removing anomalies and subsequently enhancing the utility of synthetic medical data.

This thesis proposes a framework compatible with state-of-the-art approaches for differentially private dataset release based on the usage of Generative Adversarial Networks (GANs). Our framework uses a part of the privacy budget to train an unsupervised learning model to detect and remove anomalies. We evaluate the performance of the framework using a variety of machine-learning models and metrics. The final results show an improvement of up 13% compared to approaches not using our framework, under the same privacy budget.