Maverick Matters

Client Contribution and Selection in Federated Learning

Conference Paper (2023)
Author(s)

J. Huang (TU Delft - Data-Intensive Systems)

Chi Hong (TU Delft - Data-Intensive Systems)

Yang Liu (Tsinghua University)

Y. Chen (TU Delft - Data-Intensive Systems)

S. Roos (TU Delft - Data-Intensive Systems)

Research Group
Data-Intensive Systems
Copyright
© 2023 J. Huang, C. Hong, Yang Liu, Lydia Y. Chen, S. Roos
DOI related publication
https://doi.org/10.1007/978-3-031-33377-4_21
More Info
expand_more
Publication Year
2023
Language
English
Copyright
© 2023 J. Huang, C. Hong, Yang Liu, Lydia Y. Chen, S. Roos
Research Group
Data-Intensive Systems
Pages (from-to)
269-282
ISBN (print)
9783031333767
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Federated learning (FL) enables collaborative learning between parties, called clients, without sharing the original and potentially sensitive data. To ensure fast convergence in the presence of such heterogeneous clients, it is imperative to timely select clients who can effectively contribute to learning. A realistic but overlooked case of heterogeneous clients are Mavericks, who monopolize the possession of certain data types, e.g., children hospitals possess most of the data on pediatric cardiology. In this paper, we address the importance and tackle the challenges of Mavericks by exploring two types of client selection strategies. First, we show theoretically and through simulations that the common contribution-based approach, Shapley Value, underestimates the contribution of Mavericks and is hence not effective as a measure to select clients. Then, we propose FedEMD, an adaptive strategy with competitive overhead based on the Wasserstein distance, supported by a proven convergence bound. As FedEMD adapts the selection probability such that Mavericks are preferably selected when the model benefits from improvement on rare classes, it consistently ensures the fast convergence in the presence of different types of Mavericks. Compared to existing strategies, including Shapley Value-based ones, FedEMD improves the convergence speed of neural network classifiers with FedAvg aggregation by 26.9% and its performance is consistent across various levels of heterogeneity.

Files

978_3_031_33377_4_21.pdf
(pdf | 0.786 Mb)
License info not available