Federated Learning with Rebalanced Dataset

More Info
expand_more

Abstract

With the widespread application of artificial intelligence, centralized machine learning approaches, which require access to users' local data, have raised concerns about data privacy. In response, federated learning, an architecture that aggregates models trained locally with local data, has been proposed. This approach addresses the data privacy issues inherent in centralized machine learning while also alleviating the high communication costs and server resource demands. However, due to its architectural nature, federated learning, with its single global model, struggles to meet the diverse personalized needs of clients and suffers significant accuracy degradation when client data distributions are uneven or exhibit non-IID characteristics. Personalized federated learning has been introduced to address these issues of data heterogeneity and personalized needs. Its goal is not to train a single global model but to ensure that each client participating in the personalized federated learning framework has a local model that meets their individual needs. Yet, personalized federated learning also has its shortcomings: the global model in federated learning often becomes an intermediary product in this framework, lacking the advantage of learning a generalized model. This thesis proposes a new personalized federated learning scheme, Federated Learning with Rebalanced Dataset(FedReb), which is based on parameter decoupling. By introducing a rebalanced dataset generated according to the distribution of clients' local data, this framework achieves high accuracy for both the global model and average client models. Comparative experiments demonstrate its superior scalability and robustness over other federated learning and personalized federated learning algorithms, and the report suggests optimal configurations for achieving the best results with reasonable costs. Additionally, a testbed has been established, and the deployment of the algorithm on it has been realized, verifying the feasibility of the algorithm in real-world setups.