On the Gap Between Diffusion and Transformer Multi-Tabular Generation

None, None; None, None; None, None

On the Gap Between Diffusion and Transformer Multi-Tabular Generation

Conference Paper (2025)

Author(s)

Gijs Paardekooper (Cross Options)

Jeroen M. Galjaard (TU Delft - Data-Intensive Systems)

Lydia Y. Chen (University of Neuchâtel)

Research Group

Data-Intensive Systems

DOI related publication

https://doi.org/10.1145/3746252.3761530

Clustering Diffusion Transformer Layer sharing Multi-tabular Synthetic tabular data

To reference this document use:

https://resolver.tudelft.nl/uuid:e917fad3-0bef-4a0e-98a6-c86011108f68

More Info

expand_more

Publication Year

2025

Language

English

Research Group

Data-Intensive Systems

Pages (from-to)

5947-5954

Publisher

ACM

ISBN (electronic)

9798400720406

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Shareable tabular data is of high importance in industry and research. While generating synthetic records is well-studied, research has only recently extended to relational data synthesis. In the tabular generation setting, diffusion and transformer models exhibit superior performance over prior art. However, in the relational setting, diffusion models outperform transformers. This work focuses on the performance gap between tabular transformers and diffusion models in single (tabular) and multi-table (relational) settings, using REaLTabformer and ClavaDDPM as representative state-of-the-art models. We evaluate these architectures on a set of single- and multi-table datasets, highlighting the gap's root causes between the methods. In our experiments, we attribute this difference to the influence of contextual information and data representation. To bridge the gap in the relational setting, we propose two seemingly simple strategies: layer sharing and contextual cues. This work1 offers insights into key design considerations for single- and multitable generative models, including the incorporation of contextual information and the reuse of existing knowledge. With the proposed methods, we achieve improvements of 1.52× and 1.94× for the Logistic Detection and Discriminator Measure metrics, respectively.

Files

3746252.3761530.pdf

(pdf | 1.84 Mb)