Decentralized learning is a paradigm that enables machine learning in a distributed and decentralized manner. A common challenge in this setting is the presence of non-identically and independently distributed (non-IID) data across clients. Under such conditions, it has been show
...
Decentralized learning is a paradigm that enables machine learning in a distributed and decentralized manner. A common challenge in this setting is the presence of non-identically and independently distributed (non-IID) data across clients. Under such conditions, it has been shown that node churn, where clients leave and rejoin the system, leads to reduced generalization performance and slower convergence. This degradation occurs because certain data classes may exist only on a few clients. Thus, if those clients drop out, the global model may lose access to important parts of the data distribution. This setting poses an important question: How can we mitigate the impact of node churn in decentralized learning systems to maintain some persistence of member contributions? To address this challenge, we empirically study the effectiveness of data augmentation, specifically extending local datasets with small synthetic datasets received from neighbors and generated using their local data. We further enhance this approach with supervised contrastive loss applied to synthetic and local data together, to which we refer to as synthetic anchors. Through experiments on the MNIST and CIFAR10 datasets, we demonstrate that data augmentation and synthetic anchors effectively mitigate the effects of churn and help preserve member contribution in decentralized learning.