On the Gap Between Diffusion and Transformer Multi-Tabular Generation

Conference Paper (2025)
Author(s)

Gijs Paardekooper (Cross Options)

J.M. Galjaard (TU Delft - Data-Intensive Systems)

Lydia Y. Chen (University of Neuchâtel)

Research Group
Data-Intensive Systems
DOI related publication
https://doi.org/10.1145/3746252.3761530
More Info
expand_more
Publication Year
2025
Language
English
Research Group
Data-Intensive Systems
Pages (from-to)
5947-5954
ISBN (electronic)
9798400720406
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Shareable tabular data is of high importance in industry and research. While generating synthetic records is well-studied, research has only recently extended to relational data synthesis. In the tabular generation setting, diffusion and transformer models exhibit superior performance over prior art. However, in the relational setting, diffusion models outperform transformers. This work focuses on the performance gap between tabular transformers and diffusion models in single (tabular) and multi-table (relational) settings, using REaLTabformer and ClavaDDPM as representative state-of-the-art models. We evaluate these architectures on a set of single- and multi-table datasets, highlighting the gap's root causes between the methods. In our experiments, we attribute this difference to the influence of contextual information and data representation. To bridge the gap in the relational setting, we propose two seemingly simple strategies: layer sharing and contextual cues. This work1 offers insights into key design considerations for single- and multitable generative models, including the incorporation of contextual information and the reuse of existing knowledge. With the proposed methods, we achieve improvements of 1.52× and 1.94× for the Logistic Detection and Discriminator Measure metrics, respectively.