Print Email Facebook Twitter Ripple Watermarking for Latent Tabular Diffusion Models Title Ripple Watermarking for Latent Tabular Diffusion Models Author Tang, Jiayi (TU Delft Electrical Engineering, Mathematics and Computer Science) Contributor Chen, Lydia Y. (mentor) Anand, A. (graduation committee) Degree granting institution Delft University of Technology Programme Computer Science Date 2024-06-17 Abstract Synthetic tabular data generated by tabular generative models represent an effective means of augmenting and sharing data. It is of paramount importance to trace and audit such synthetic data, avoiding potential harms and risks associated with inappropriate usage. While watermarking techniques are increasingly used for synthetic images, little is known about how to watermark synthetic tables such that they are imperceptible for humans, detectable by algorithms, and robust against post-editing. In this paper, we present the first watermarking algorithm for tabular diffusion models, which inserts novel ripple watermarks into the latent space of tables. For every synthetic table, the watermark initiates from a central ring withinthe Fourier-transformed latent of the table, extending gradually across a large portion of the space. The watermark can be detected by calculating the distance between the Fourier-transformed tabular latent and the ground-truth watermark patch. Additionally, we develop post-editing attacks, including row/column/value deletion and distortion, to evaluate the robustness of the watermark. Our evaluation on four datasets demonstrates that our watermarking scheme effectively preserves the quality of synthetic tables in terms of resemblance, discriminability, and downstream utility. The average quality difference is less than 0.6% compared to non-watermarked data, while maintaining high detectability, with average statistical p-values over 25× lower than 0.02. Additionally, our robustness analysisshows that the watermark is resilient against various post-editing actions, with85% of the p-values remaining below 0.05 across all 18 attack settings on fourdatasets. Subject WatermarkingTabular Data SynthesisDiffusion Models To reference this document use: http://resolver.tudelft.nl/uuid:5b7a4484-2811-4a04-a39d-c629bc9f12ee Part of collection Student theses Document type master thesis Rights © 2024 Jiayi Tang Files PDF Master-thesis-jiayi-Tang.pdf 855.99 KB Close viewer /islandora/object/uuid:5b7a4484-2811-4a04-a39d-c629bc9f12ee/datastream/OBJ/view