Generative Federated Learning Approaches for Non-IID Data
Enhancing Federated Models with Synthetic Data
P.K. Cho (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Swier Garst – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
David M. J. Tax – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
A. Voulimeneas – Graduation committee member (TU Delft - Cyber Security)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Federated Learning (FL) is a machine learning approach that has gained considerable interest over the years. FL allows global models to train without compromising the data privacy of the clients' training datasets by sending the global model to each client to learn the weights and propagating only the learned weights back to a central location. However, it is not without limitations as several challenges hinder the model's performance. One of those challenges is the presence of non-IID (Independent and Identically Distributed) properties in the training data. Most real-world data is non-IID, and this imbalance in data distribution has been shown to significantly affect the model's performance. To address this issue, we propose a generative federated learning by pre-training the global model on synthetic data created by a generative model that follows the collective distribution of all clients' training datasets. Our research shows that this approach bridges the performance gap between IID and non-IID in FL, except for certain extreme non-IID cases.