FCT-GAN

Enhancing Global Correlation of Table Synthesis via Fourier Transform

Conference Paper (2023)
Author(s)

Zilong Zhao (TU Delft - Data-Intensive Systems)

Robert Birke (University of Turin)

Y. Chen (TU Delft - Data-Intensive Systems)

Research Group
Data-Intensive Systems
DOI related publication
https://doi.org/10.1145/3583780.3615202
More Info
expand_more
Publication Year
2023
Language
English
Research Group
Data-Intensive Systems
Pages (from-to)
4450–4454
ISBN (print)
979-8-4007-0124-5
ISBN (electronic)
9798400701245
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

An alternative method for sharing knowledge while complying with strict data access regulations, such as the European General Data Protection Regulation (GDPR), is the emergence of synthetic tabular data. Mainstream table synthesizers utilize methodologies derived from Generative Adversarial Networks (GAN). Although several state-of-the-art (SOTA) tabular GAN algorithms inherit Convolutional Neural Network (CNN)-based architectures, which have proven effective for images, they tend to overlook two critical properties of tabular data: (i) the global correlation across columns, and (ii) the semantic invariance to the column order. Permuting columns in a table does not alter the semantic meaning of the data, but features extracted by CNNs can change significantly due to their limited convolution filter kernel size. To address the above problems, we propose FCT-GAN the first conditional tabular GAN to adopt Fourier networks into table synthesis. FCT-GAN enhances permutation invariant GAN training by strengthening the learning of global correlations via Fourier layers. Extensive evaluation on benchmarks and real-world datasets show that FCT-GAN can synthesize tabular data with better (up to 27.8%) machine learning utility (i.e. a proxy of global correlations) and higher (up to 26.5%) statistical similarity to real data. FCT-GAN also has the least variation on synthetic data quality among 7 SOTA baselines on 3 different training-data column orders.

Files

3583780.3615202.pdf
(pdf | 1.43 Mb)
- Embargo expired in 21-04-2024
License info not available