Multi-server Asynchronous Federated Learning

Master Thesis (2023)
Author(s)

Y. Zuo (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Lydia Y. Chen – Mentor (TU Delft - Data-Intensive Systems)

Jeremie Decouchant – Graduation committee member (TU Delft - Data-Intensive Systems)

B.A. Cox – Coach (TU Delft - Data-Intensive Systems)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2023 Yuncong Zuo
More Info
expand_more
Publication Year
2023
Language
English
Copyright
© 2023 Yuncong Zuo
Graduation Date
23-08-2023
Awarding Institution
Delft University of Technology
Programme
['Electrical Engineering | Embedded Systems']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

In federated learning systems, a server maintains a global model trained by a set of clients based on their local datasets. Conventional synchronous FL systems are very sensitive to system heterogeneity since the server needs to wait for the slowest clients in each round. Asynchronous fl partially addresses this bottleneck by dealing with updates once they are received. But with a single server, the system performance would be influenced if the clients are located far from the server and require very high communication costs. Another issue in single-server settings is that the client scale is limited since the server can be overloaded with heavy communication and computation workload. Moreover, a crash on the central server is fatal to the single-server system. Multi-server FLreduces the average communication cost by decreasing the distance between servers and clients. However, the bottleneck brought by the slowest clients still exists in multi-server systems that preserve synchrony, such as Hierarchical FL. The approach we follow in this paper consists in replicating the server in a way that the global training process remains asynchronous. We propose MultiAsync, a novel asynchronous multi-server FL framework that aims to address the single-server and synchronous-system bottleneck.

Files

YuncongZuo_thesis.pdf
(pdf | 1.86 Mb)
License info not available