Multi-server Asynchronous Federated Learning

None, None

Multi-server Asynchronous Federated Learning

Master Thesis (2023)

Author(s)

Y. Zuo (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Lydia Y. Chen – Mentor (TU Delft - Data-Intensive Systems)

Jeremie Decouchant – Graduation committee member (TU Delft - Data-Intensive Systems)

B.A. Cox – Coach (TU Delft - Data-Intensive Systems)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Federated Learning Multi-server FL Asynchronous Systems

To reference this document use:

https://resolver.tudelft.nl/uuid:7fdd1a14-0639-4edb-aca4-c729794eebb9

More Info

expand_more

Publication Year

2023

Language

English

Copyright

Graduation Date

23-08-2023

Awarding Institution

Delft University of Technology

Programme

['Electrical Engineering | Embedded Systems']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

In federated learning systems, a server maintains a global model trained by a set of clients based on their local datasets. Conventional synchronous FL systems are very sensitive to system heterogeneity since the server needs to wait for the slowest clients in each round. Asynchronous fl partially addresses this bottleneck by dealing with updates once they are received. But with a single server, the system performance would be influenced if the clients are located far from the server and require very high communication costs. Another issue in single-server settings is that the client scale is limited since the server can be overloaded with heavy communication and computation workload. Moreover, a crash on the central server is fatal to the single-server system. Multi-server FLreduces the average communication cost by decreasing the distance between servers and clients. However, the bottleneck brought by the slowest clients still exists in multi-server systems that preserve synchrony, such as Hierarchical FL. The approach we follow in this paper consists in replicating the server in a way that the global training process remains asynchronous. We propose MultiAsync, a novel asynchronous multi-server FL framework that aims to address the single-server and synchronous-system bottleneck.

Files

YuncongZuo_thesis.pdf

(pdf | 1.86 Mb)

License info not available