UniformGAN: generative adversarial networks in uniform probability spaces

None, None

UniformGAN: generative adversarial networks in uniform probability spaces

Improving correlation by leveraging integral probability transform

Bachelor Thesis (2022)

Author(s)

M. Visser (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Y. Chen – Mentor (TU Delft - Data-Intensive Systems)

Z. Zhao – Mentor (TU Delft - Data-Intensive Systems)

K.G. Langendoen – Graduation committee member (TU Delft - Embedded Systems)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

GAN Neural networks Distribution functions

To reference this document use:

https://resolver.tudelft.nl/uuid:0dc6e290-1a30-42e0-bd37-5d3dc6db95db

More Info

expand_more

Publication Year

2022

Language

English

Copyright

Graduation Date

23-06-2022

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Abstract

Sharing data is becoming increasingly difficult, due to the regulatory constraints imposed by the General Data Protection Regulation (GDPR). Businesses are not allowed to share data which contains privacy sensitive information. Synthetic data generation has emerged as a solution to this problem. State of the art generative adversarial networks (GAN) can generate synthetic data which statistically resembles the original data, while changing privacy sensitive information so that it cannot be related back to a person.

However, the process of generating synthetic data is still a very time consuming process for data scientists.

One of the challenges faced in synthetic data generation is aptly modeling the raw data; transforming it into numerical, and specifying the hyper-parameters such as which columns are categorical, mixed type, numerical or log distributed, is a non-trivial task. Another challenge is making estimations about the underlying distributions of the data and how these different distributions are correlated.

The proposed solution UniformGAN addresses these issues by adopting a transformer which can handle raw data and detect the data type and transforms it into a numerical equivalent. It uses the data type and estimated distribution to set the hyper-parameters for categorical columns, mixed columns, and log columns.
Furthermore, it estimates the underlying distributions of the data and leverages a statistical transformation in order for the machine learning model to easier learn the dependence structure of variables.

The evaluation with regard to machine learning utility, statistical similarity, and privacy preverabiliy has shown that UniformGAN improves accuracy with regard to decision tree classification utility, improving averaged machine learning utility by 2% compared to CTAB-GAN, and 19.21% compared to copulaGAN, while maintaining statistical similarity and privacy preservability compared to state of the art tabular data modeling techniques.

Files

UniformGAN.pdf

(pdf | 0.612 Mb)

License info not available

Poster_UniformGAN.jpg

(jpg | 0.166 Mb)

License info not available

Poster_UniformGAN.pdf

(pdf | 0.108 Mb)

License info not available