Extending CTAB-GAN+ with StackGAN architecture

More Info
expand_more

Abstract

As privacy regulations (e.g. European General Data Protection Regulation) often prevent valuable flows of data between stakeholders, data synthesis can play a crucial role in sharing captured value in data sets without sharing personal details. Different attempts have been made at solving this problem with Generative Adversarial Networks (GAN) architectures. The aim of this study to create an architecture that can create data that is more resemblant of an original dataset than what the state-of-the-art can achieve, while functioning under the same privacy budget. We apply the concept of progressive learning on the state of the art tabular GAN, i.e., stacking two CTAB-GAN in a sequential manner. Two CTAB-GAN architectures were used, where Stage I CTAB-GAN is trained as usual and the Stage II CTAB-GAN is trained using the result of Stage I CTAB-GAN and the same conditional vector that was used for Stage I. Stage II CTABGAN can rectify defects made in Stage I and further refine the the results to finally achieve a generative model that can better capture the information in the original dataset. The challenge lies in creating the condition in which the second GAN can actually improve on the first GANs results. The results on four datasets show that better data synthesis can be achieved with a second GAN that has a fully connected generator, as opposed to a copy of the original generator of CTAB-GAN.