TabVFL: Improving Latent Representation Learning in Vertical Federated Learning

None, None

TabVFL: Improving Latent Representation Learning in Vertical Federated Learning

Master Thesis (2023)

Author(s)

M. Rashad (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Jérémie Decouchant – Graduation committee member (TU Delft - Data-Intensive Systems)

Y. Chen – Mentor (TU Delft - Data-Intensive Systems)

Zilong Zhao – Graduation committee member (TU Delft - Data-Intensive Systems)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Autoencoder Tabular data Vertical federated learning

To reference this document use:

https://resolver.tudelft.nl/uuid:061ef047-987d-4e56-8d21-1cdc3206fd0b

More Info

expand_more

Publication Year

2023

Language

English

Copyright

Graduation Date

14-11-2023

Awarding Institution

Delft University of Technology

Programme

['Computer Science']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Autoencoders are popular neural networks that are able to compress high dimensional data to extract relevant latent information. TabNet is a state-of-the-art neural network model designed for tabular data that utilizes an autoencoder architecture for training. Vertical Federated Learning (VFL) is an emerging distributed machine learning paradigm that allows multiple parties to train a model collaboratively on vertically partitioned data while maintaining data privacy. The existing design of training autoencoders in VFL is to train a separate autoencoder in each participant and aggregate the latent representation later. This design could potentially break important correlations between feature data of participating parties, as each autoencoder is trained on locally available features while disregarding the features of others. In addition, traditional autoencoders are not specifically designed for tabular data, which is ubiquitous in VFL settings. Moreover, the impact of client failures during training on the model robustness is under-researched in the VFL scene. In this paper, we propose TabVFL, a distributed framework designed to improve latent representation learning using the joint features of participants. The framework (i) preserves privacy by mitigating potential data leakage with the addition of a fully-connected layer, (ii) conserves feature correlations by learning one latent representation vector, and (iii) provides enhanced robustness against client failures during training phase. Extensive experiments on five classification datasets show that TabVFL can outperform the prior work design, with 26.12% of improvement on f1-score.

Files

Master_Thesis_TabVFL_Mohamed_R... (pdf)

(pdf | 2.2 Mb)

License info not available