Title
Maverick Matters: Client Contribution and Selection in Federated Learning
Author
Huang, J. (TU Delft Data-Intensive Systems)
Hong, C. (TU Delft Data-Intensive Systems)
Liu, Yang (Tsinghua University)
Chen, Lydia Y. (TU Delft Data-Intensive Systems)
Roos, S. (TU Delft Data-Intensive Systems)
Contributor
Kashima, Hisashi (editor)
Ide, Tsuyoshi (editor)
Peng, Wen-Chih (editor)
Date
2023
Abstract
Federated learning (FL) enables collaborative learning between parties, called clients, without sharing the original and potentially sensitive data. To ensure fast convergence in the presence of such heterogeneous clients, it is imperative to timely select clients who can effectively contribute to learning. A realistic but overlooked case of heterogeneous clients are Mavericks, who monopolize the possession of certain data types, e.g., children hospitals possess most of the data on pediatric cardiology. In this paper, we address the importance and tackle the challenges of Mavericks by exploring two types of client selection strategies. First, we show theoretically and through simulations that the common contribution-based approach, Shapley Value, underestimates the contribution of Mavericks and is hence not effective as a measure to select clients. Then, we propose FedEMD, an adaptive strategy with competitive overhead based on the Wasserstein distance, supported by a proven convergence bound. As FedEMD adapts the selection probability such that Mavericks are preferably selected when the model benefits from improvement on rare classes, it consistently ensures the fast convergence in the presence of different types of Mavericks. Compared to existing strategies, including Shapley Value-based ones, FedEMD improves the convergence speed of neural network classifiers with FedAvg aggregation by 26.9% and its performance is consistent across various levels of heterogeneity.
Subject
client selection
data heterogeneity
Federated learning
shapley value
wasserstein distance
To reference this document use:
http://resolver.tudelft.nl/uuid:691514b3-285d-4a75-af82-12d53b3e16ce
DOI
https://doi.org/10.1007/978-3-031-33377-4_21
Publisher
Springer
ISBN
9783031333767
Source
Advances in Knowledge Discovery and Data Mining - 27th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2023, Proceedings
Event
27th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2023, 2023-05-25 → 2023-05-28, Osaka, Japan
Series
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 0302-9743, 13936 LNCS
Part of collection
Institutional Repository
Document type
conference paper
Rights
© 2023 J. Huang, C. Hong, Yang Liu, Lydia Y. Chen, S. Roos