BRISTLE: Decentralized Federated Learning in Byzantine, Non-i.i.d. Environments

More Info
expand_more

Abstract

Federated learning (FL) is a type of machine learning where devices locally train a model on their private data. The devices iteratively communicate this model to a central server which combines the models and sends the updated model back to all devices. Because the data stays on the devices and only the model is transmitted, federated learning is considered as a privacy-friendly alternative to regular machine learning where all data is transmitted over the internet. However, the central server used in typical FL systems not only poses a single point of failure susceptible to crashes or hacks, but may also become a performance bottleneck. These issues are alleviated by decentralized FL (DFL), where the peers communicate model updates with each other instead of with a single server. Unfortunately, DFL is challenging since (1) the training data possessed by different peers is often non-i.i.d. (i.e., distributed differently between the peers) and (2) malicious, or Byzantine, attackers can share arbitrary model updates with other peers to subvert the training process. We address these two challenges and present Bristle, middleware between the learning application and the decentralized network layer. Bristle leverages transfer learning to predetermine and freeze the non-output layers of a neural network, significantly speeding up model training and lowering communication costs. To securely update the output layer with model updates from other peers, we design a fast distance-based prioritizer and a novel performance-based integrator. The prioritizer prioritizes the model updates based on their distance to the peer's own model and an explore-exploit trade-off, and the integrator integrates each class of each model update separately based on their performance on a small set of i.i.d. test samples. Their combined effect results in high resilience to Byzantine attackers and the ability to handle non-i.i.d. classes. We empirically show that Bristle converges to a consistent 95% accuracy in Byzantine environments, outperforming all evaluated baselines. In non-Byzantine environments, Bristle requires 83% fewer iterations to achieve 90% accuracy compared to state-of-the-art methods. We show that when the training classes are non-i.i.d., Bristle significantly outperforms the accuracy of the most Byzantine-resilient baselines by 2.3x while reducing communication costs by 90%.