Maverick Matters

None, None; None, None; None, None; None, None; None, None

Maverick Matters

Client Contribution and Selection in Federated Learning

Conference Paper (2023)

Author(s)

J. Huang (TU Delft - Data-Intensive Systems)

C. Hong (TU Delft - Data-Intensive Systems)

Yang Liu (Tsinghua University)

Lydia Y. Chen (TU Delft - Data-Intensive Systems)

Stefanie Roos (TU Delft - Data-Intensive Systems)

Research Group

Data-Intensive Systems

Copyright

DOI related publication

https://doi.org/10.1007/978-3-031-33377-4_21

Federated learning Data heterogeneity Client selection Shapley value Wasserstein distance

To reference this document use:

https://resolver.tudelft.nl/uuid:691514b3-285d-4a75-af82-12d53b3e16ce

More Info

expand_more

Publication Year

2023

Language

English

Copyright

Research Group

Data-Intensive Systems

Pages (from-to)

269-282

ISBN (print)

9783031333767

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Federated learning (FL) enables collaborative learning between parties, called clients, without sharing the original and potentially sensitive data. To ensure fast convergence in the presence of such heterogeneous clients, it is imperative to timely select clients who can effectively contribute to learning. A realistic but overlooked case of heterogeneous clients are Mavericks, who monopolize the possession of certain data types, e.g., children hospitals possess most of the data on pediatric cardiology. In this paper, we address the importance and tackle the challenges of Mavericks by exploring two types of client selection strategies. First, we show theoretically and through simulations that the common contribution-based approach, Shapley Value, underestimates the contribution of Mavericks and is hence not effective as a measure to select clients. Then, we propose FedEMD, an adaptive strategy with competitive overhead based on the Wasserstein distance, supported by a proven convergence bound. As FedEMD adapts the selection probability such that Mavericks are preferably selected when the model benefits from improvement on rare classes, it consistently ensures the fast convergence in the presence of different types of Mavericks. Compared to existing strategies, including Shapley Value-based ones, FedEMD improves the convergence speed of neural network classifiers with FedAvg aggregation by 26.9% and its performance is consistent across various levels of heterogeneity.

Files

978_3_031_33377_4_21.pdf

(pdf | 0.786 Mb)

License info not available