Federated learning from non-iid data

None, None

Federated learning from non-iid data

Improving accuracy through data-augmentation and communication efficiency

Master Thesis (2022)

Author(s)

I. Cornelis (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Y. Chen – Mentor (TU Delft - Data-Intensive Systems)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Data Augmentation Federated Learning Non-iid Data heterogeneity Communication efficiency

To reference this document use:

https://resolver.tudelft.nl/uuid:c192ecaf-e8d4-42ab-8a12-d52ab9bd7e53

More Info

expand_more

Publication Year

2022

Language

English

Copyright

Graduation Date

16-02-2022

Awarding Institution

Delft University of Technology

Programme

['Electrical Engineering | Embedded Systems']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Federated learning allows multiple parties to collaboratively develop a deep learning model, without sharing private data. Models can be generated from the most up-to-date data while taking unique and not publicly available data into account. However, the distributed nature of federated learning causes problems too, and clients are not guaranteed to hold independently identically distributed (iid) data, causing performance degradation.

This work analyzes existing methods of generating such skewed datasets and finds that the Earth Movers Distance (EMD) can be used to compare them. A novel scheme called phase-shift is introduced, which allows clients to communicate more frequently, without increasing communication, hereby reducing drift caused by non-iid data. Finally, we propose a data-driven approach that can reduce the data skew by supplementing local datasets with augmented data. A novel method of balancing unaltered and augmented data is introduced, taking the skew of the dataset into account.

Empirical analysis shows that phase-shift can reduce the instantaneous communication load on the system by 37.5% without suffering a performance loss or reducing convergence rate. Evaluation of data augmentation on a heavily skewed cifar10 dataset shows that accuracy is improved by 10%. Finally, phase-shift and data augmentation are combined, resulting in a 13% accuracy improvement, surpassing algorithms such as FedNova and FedProx when dealing with label-heterogeneity.

Files

Federated_learning_from_non_ii... (pdf)

(pdf | 0.719 Mb)

License info not available