RW

R. Wang

info

Please Note

13 records found

Exploring anomaly bias in backdoor attacks

Journal article (2026) - Hua Wang, Shaoxiong Wang, Lianhua Wang, Rui Wang
Federated learning (FL) allows multiple parties to collaboratively train machine learning models by uploading model updates instead of raw data, thereby protecting data privacy and reducing communication overhead. However, the open nature of public networks makes them vulnerable to attacks. By injecting poisoned samples with backdoor triggers during training and uploading malicious updates, an attacker can manipulate the global model to produce any specified target label. Existing defenses against backdoor attacks have limitations, such as high attack success rates or the need to know or restrict the number of compromised clients controlled by the attacker. To address these shortcomings, we propose FLAB, a novel defense to filter out malicious updates. Specifically, we introduce the concept of anomaly bias to characterize each model update and propose a detection mechanism to quantify their anomalous degrees. By clustering anomaly biases and iteratively reducing the size of the cluster, the anomaly bias associated with the attacker is identified. Finally, all updates with this bias are considered malicious and removed. We conduct exhaustive evaluations of FLAB. Experimental results demonstrate that, compared to existing defenses, FLAB achieves comparable model accuracy while significantly reducing attack success rates. Furthermore, FLAB maintains robust performance even when the number of compromised clients exceeds 80 %. ...

Byzantine-robust federated learning through clustering model update parameters

Journal article (2025) - Hua Wang, Shaoxiong Wang, Rui Wang, Pengxiang Wang
Federated Learning (FL) is a distributed machine learning paradigm that enables multiple clients to collaboratively train a model without sharing their private data. However, its distributed nature makes FL vulnerable to Byzantine attacks. Most existing Byzantine-robust FL schemes have limitations, such as ineffective defense against well-crafted malicious updates or degraded performance in non-independent and identically distributed (non-IID) data scenarios. To address these challenges, we propose FedCmp, a robust FL framework with an anomaly detection mechanism. Our approach identifies malicious updates by leveraging a significant disparity in vote counts between benign and compromised clients. We first propose a clustering strategy for update parameters, followed by the implementation of a multi-round voting mechanism to accelerate vote accumulation for benign or compromised clients based on parameter diversity. Finally, following the majority principle, malicious updates are accurately filtered out without compromising the contributions of benign clients. Experimental results demonstrate that FedCmp outperforms existing robust FL schemes and maintains high accuracy even in highly non-IID data scenarios. ...
Current black-box backdoor attacks in convolutional neural networks formulate attack objective(s) as singleobjective optimization problems in single domain. Designing triggers in single domain harms semantics and trigger robustness as well as introduces visual and spectral anomaly. This work proposes a multi-objective black-box backdoor attack in dual domains via evolutionary algorithm (LADDER), the first instance of achieving multiple attack objectives simultaneously by optimizing triggers without requiring prior knowledge about victim model. In particular, we formulate LADDER as a multiobjective optimization problem (MOP) and solve it via multiobjective evolutionary algorithm (MOEA). MOEA maintains a population of triggers with trade-offs among attack objectives and uses non-dominated sort to drive triggers toward optimal solutions. We further apply preference-based selection to MOEA to exclude impractical triggers. LADDER investigates a new dualdomain perspective for trigger stealthiness by minimizing the anomaly between clean and poisoned samples in the spectral domain. Lastly, the robustness against preprocessing operations is achieved by pushing triggers to low-frequency regions. Extensive experiments comprehensively showcase that LADDER achieves attack effectiveness of at least 99%, attack robustness with 90.23% (50.09% higher than state-of-the-art attacks on average), superior natural stealthiness (1.12× to 196.74× improvement) and excellent spectral stealthiness (8.45× enhancement) as compared to current stealthy attacks by the average l2-norm across 5 public datasets. ...

Taming Malicious Majorities in Federated Learning using Privacy-preserving Byzantine-robust Clustering

Conference paper (2025) - Rui Wang, Xingkai Wang, Huanhuan Chen, Jérémie Decouchant, Stjepan Picek, Nikolaos Laoutaris, Kaitai Liang
Byzantine-robust Federated Learning (FL) aims to counter malicious clients and train an accurate global model while maintaining an extremely low attack success rate. Most existing systems, however, are only robust when most of the clients are honest. FLTrust (NDSS '21) and Zeno++ (ICML '20) do not make such an honest majority assumption but can only be applied to scenarios where the server is provided with an auxiliary dataset used to filter malicious updates. FLAME (USENIX '22) and EIFFeL (CCS '22) maintain the semi-honest majority assumption to guarantee robustness and the confidentiality of updates. It is, therefore, currently impossible to ensure Byzantine robustness and confidentiality of updates without assuming a semi-honest majority. To tackle this problem, we propose a novel Byzantine-robust and privacy-preserving FL system, called MUDGUARD, to capture malicious minority and majority for server and client sides, respectively. Our experimental results demonstrate that the accuracy of MUDGUARD is practically close to the FL baseline using FedAvg without attacks (≈0.8% gap on average). Meanwhile, the attack success rate is around 0%-5% even under an adaptive attack tailored to MUDGUARD. We further optimize our design by using binary secret sharing and polynomial transformation, leading to communication overhead and runtime decreases of 67%-89.17% and 66.05%-68.75%, respectively. ...

Fast and Secure Vertical Federated Learning based on XGBoost for Decentralized Labels

Journal article (2024) - Rui Wang, Oguzhan Ersoy, Hangyu Zhu, Yaochu Jin, Kaitai Liang
Vertical Federated Learning (VFL) enables multiple clients to collaboratively train a global model over vertically partitioned data without leaking private local information. Tree-based models, like XGBoost and LightGBM, have been widely used in VFL to enhance the interpretation and efficiency of training. However, there is a fundamental lack of research on how to conduct VFL securely over distributed labels. This work is the first to fill this gap by designing a novel protocol, called FEVERLESS, based on XGBoost. FEVERLESS leverages secure aggregation via information masking technique and global differential privacy provided by a fairly and randomly selected noise leader to prevent private information from being leaked in the training process. Furthermore, it provides label and data privacy against honest-but-curious adversaries even in the case of collusion of <inline-formula><tex-math notation="LaTeX">$n - 2$</tex-math></inline-formula> out of n clients. We present a comprehensive security and efficiency analysis for our design, and the empirical results from our experiments demonstrate that FEVERLESS is fast and secure. In particular, it outperforms the solution based on additive homomorphic encryption in runtime cost and provides better accuracy than the local differential privacy approach. ...

An Anonymous Graph Convolutional Network Against Edge-Perturbing Attacks

Journal article (2024) - Ao Liu, Beibei Li, Tao Li, Pan Zhou, Rui Wang
Recent studies have revealed the vulnerability of graph convolutional networks (GCNs) to edge-perturbing attacks, such as maliciously inserting or deleting graph edges. However, theoretical proof of such vulnerability remains a big challenge, and effective defense schemes are still open issues. In this article, we first generalize the formulation of edge-perturbing attacks and strictly prove the vulnerability of GCNs to such attacks in node classification tasks. Following this, an anonymous GCN, named AN-GCN, is proposed to defend against edge-perturbing attacks. In particular, we present a node localization theorem to demonstrate how GCNs locate nodes during their training phase. In addition, we design a staggered Gaussian noise-based node position generator and a spectral graph convolution-based discriminator (in detecting the generated node positions). Furthermore, we provide an optimization method for the designed generator and discriminator. It is demonstrated that the AN-GCN is secure against edge-perturbing attacks in node classification tasks, as AN-GCN is developed to classify nodes without the edge information (making it impossible for attackers to perturb edges anymore). Extensive evaluations verify the effectiveness of the general edge-perturbing attack (G-EPA) model in manipulating the classification results of the target nodes. More importantly, the proposed AN-GCN can achieve 82.7% in node classification accuracy without the edge-reading permission, which outperforms the state-of-the-art GCN. ...
Doctoral thesis (2024) - R. Wang
Federated Learning (FL) is a revolutionary approach to machine learning that enables collaborative model training among multiple parties without exposing sensitive data. Introduced by Google in 2016, FL taps into the wealth of data generated by edge devices while prioritizing user privacy and minimizing communication costs. Its applications span diverse sectors like healthcare, finance, and the Internet of Things. While FL offers significant benefits, it grapples with privacy concerns due to the risk of revealing sensitive information during the exchange of model updates. Security issues also arise, as some clients may behave unpredictably or maliciously, posing a threat to model accuracy or introducing vulnerabilities. Although various efforts have tackled these challenges, new technical hurdles persist, requiring innovative solutions. This thesis explores the practicality of privacy-preserving horizontal and vertical FL and enhances the Byzantine robustness of FL to ensure effective training even in the presence of malicious clients. In investigating privacy-preserving horizontal FL, the thesis uncovers practical issues. Encryption-based solutions, reliant on a Trusted Third Party (TTP) for key distribution, involve frequent, costly, and potentially unreliable communication between a central server and distributed clients, placing a significant computational burden on FL. Privacy-preserving vertical FL faces challenges, particularly when assuming that only one client possesses all labels for all samples. In real-world healthcare scenarios, where diagnoses are spread across different hospitals, label inconsistencies question the feasibility of this assumption. Addressing the Byzantine robustness of FL raises a critical consideration many existing systems assume an honest majority of clients. In reality, FL operates in environments with competing interests, where clients may manipulate the learning process. Recognizing the potential for a malicious majority of clients becomes crucial. In essence, resolving these three core issues is essential for integrating FL into real world scenarios. This thesis aims to contribute innovative methods and tools to overcome these challenges, paving the way for the widespread adoption of FL across diverse domains. ...
Journal article (2024) - Yanqi Qiao, Dazhuang Liu, Rui Wang, Kaitai Liang
Federated Learning (FL) is a beneficial decentralized learning approach for preserving the privacy of local datasets of distributed agents. However, the distributed property of FL and untrustworthy data introducing the vulnerability to backdoor attacks. In this attack scenario, an adversary manipulates its local data with a specific trigger and trains a malicious local model to implant the backdoor. During inference, the global model would misbehave for any input with the trigger to the attacker-chosen prediction. Most existing backdoor attacks against FL focus on bypassing defense mechanisms, without considering the inspection of model parameters on the server. These attacks are susceptible to detection through dynamic clustering based on model parameter similarity. Besides, current methods provide limited imperceptibility of their trigger in the spatial domain. To address these limitations, we propose a stealthy backdoor attack called "Chironex"against FL with an imperceptible trigger in frequency space to deliver attack effectiveness, stealthiness and robustness against various countermeasures on FL. We first design a frequency trigger function to generate an imperceptible frequency trigger to evade human inspection. Then we fully exploit the attacker's advantage to enhance attack robustness by estimating benign updates and analyzing the impact of the backdoor on model parameters through a task-sensitive neuron searcher. It disguises malicious updates as benign ones by reducing the impact of backdoor neurons that greatly contribute to the backdoor task based on activation value, and encouraging them to update towards benign model parameters trained by the attacker. We conduct extensive experiments on various image classifiers with real-world datasets to provide empirical evidence that Chironex can evade the most recent robust FL aggregation algorithms, and further achieve a distinctly higher attack success rate than existing attacks, without undermining the utility of the global model. ...

Taming Malicious Majorities in Federated Learning using Privacy-preserving Byzantine-robust Clustering

Journal article (2024) - Rui Wang, Xingkai Wang, Huanhuan Chen, Jérémie Decouchant, Stjepan Picek, Nikolaos Laoutaris, Kaitai Liang
Byzantine-robust Federated Learning (FL) aims to counter malicious clients and train an accurate global model while maintaining an extremely low attack success rate. Most existing systems, however, are only robust when most of the clients are honest. FLTrust (NDSS '21) and Zeno++ (ICML '20) do not make such an honest majority assumption but can only be applied to scenarios where the server is provided with an auxiliary dataset used to filter malicious updates. FLAME (USENIX '22) and EIFFeL (CCS '22) maintain the semi-honest majority assumption to guarantee robustness and the confidentiality of updates. It is therefore currently impossible to ensure Byzantine robustness and confidentiality of updates without assuming a semi-honest majority. To tackle this problem, we propose a novel Byzantine-robust and privacy-preserving FL system, called MUDGUARD, to capture malicious minority and majority for server and client sides, respectively. Our experimental results demonstrate that the accuracy of MUDGUARD is practically close to the FL baseline using FedAvg without attacks (approximate 0.8% gap on average). Meanwhile, the attack success rate is around 0%-5% even under an adaptive attack tailored to MUDGUARD. We further optimize our design by using binary secret sharing and polynomial transformation leading to communication overhead and runtime decreases of 67%-89.17% and 66.05%-68.75%, respectively. ...

Privacy-Preserving Vertical Federated Learning Over Distributed Labels

Journal article (2023) - Hangyu Zhu, Rui Wang, Yaochu Jin, Kaitai Liang
Federated learning (FL) is an emerging privacy preserving machine learning protocol that allows multiple devices to collaboratively train a shared global model without revealing their private local data. Nonparametric models like gradient boosting decision trees (GBDTs) have been commonly used in FL for vertically partitioned data. However, all these studies assume that all the data labels are stored on only one client, which may be unrealistic for real-world applications. Therefore, in this article, we propose a secure vertical FL framework, named privacy-preserving vertical federated learning system over distributed labels (PIVODL), to train GBDTs with data labels distributed on multiple devices. Both homomorphic encryption and differential privacy are adopted to prevent label information from being leaked through transmitted gradients and leaf values. Our experimental results show that both information leakage and model performance degradation of the proposed PIVODL are negligible. Impact Statement - Federated learning is a distributed machine learning framework proposed for privacy preservation. Most federated learning algorithms work on horizontally partitioned data, with only a few exceptions considering vertically partitioned data that is widely seen in the real world. However, existing vertical federated learning makes an unrealistic assumption that data labels are distributed on only one device and no research has been reported so far that considers data labels distributed on multiple client devices. The PIVODL framework reported in this article allows us to build a secure vertical federated XGBoost system, in which the labels may distributed either on one device or on multiple devices, making it possible to apply federated learning to a wider range of real-world problems. ...
Conference paper (2022) - J. Xu, R. Wang, S. Koffas, K. Liang, S. Picek
Graph Neural Networks (GNNs) are a class of deep learning-based methods for processing graph domain information. GNNs have recently become a widely used graph analysis method due to their superior ability to learn representations for complex graph data. Due to privacy concerns and regulation restrictions, centralized GNNs can be difficult to apply to data-sensitive scenarios. Federated learning (FL) is an emerging technology developed for privacy-preserving settings when several parties need to train a shared global model collaboratively. Although several research works have applied FL to train GNNs (Federated GNNs), there is no research on their robustness to backdoor attacks.

This paper bridges this gap by conducting two types of backdoor attacks in Federated GNNs: centralized backdoor attacks (CBA) and distributed backdoor attacks (DBA). Our experiments show that the DBA attack success rate is higher than CBA in almost all cases. For CBA, the attack success rate of all local triggers is similar to the global trigger, even if the training set of the adversarial party is embedded with the global trigger. To explore the properties of two backdoor attacks in Federated GNNs, we evaluate the attack performance for a different number of clients, trigger sizes, poisoning intensities, and trigger densities. Finally, we explore the robustness of DBA and CBA against two state-of-the-art defenses. We find that both attacks are robust against the investigated defenses, necessitating the need to consider backdoor attacks in Federated GNNs as a novel threat that requires custom defenses. ...
Journal article (2021) - Hangyu Zhu, Rui Wang, Yaochu Jin, Kaitai Liang, Jianting Ning
Homomorphic encryption is a very useful gradient protection technique used in privacy preserving federated learning. However, existing encrypted federated learning systems need a trusted third party to generate and distribute key pairs to connected participants, making them unsuited for federated learning and vulnerable to security risks. Moreover, encrypting all model parameters is computationally intensive, especially for large machine learning models such as deep neural networks. In order to mitigate these issues, we develop a practical, computationally efficient encryption based protocol for federated deep learning, where the key pairs are collaboratively generated without the help of a trusted third party. By quantization of the model parameters on the clients and an approximated aggregation on the server, the proposed method avoids encryption and decryption of the entire model. In addition, a threshold based secret sharing technique is designed so that no one can hold the global private key for decryption, while aggregated ciphertexts can be successfully decrypted by a threshold number of clients even if some clients are offline. Our experimental results confirm that the proposed method significantly reduces the communication costs and computational complexity compared to existing encrypted federated learning without compromising the performance and security. ...

Investigating Arbitrageurs and Oracle Manipulators in Ethereum

Conference paper (2021) - Kevin Tjiam, Rui Wang, Huanhuan Chen, Kaitai Liang
Smart contracts on Ethereum enable billions of dollars to be transacted in a decentralized, transparent and trustless environment. However, adversaries lie await in the Dark Forest, waiting to exploit any and all smart contract vulnerabilities in order to extract profits from unsuspecting victims in this new financial system. As the blockchain space moves at a breakneck pace, exploits on smart contract vulnerabilities rapidly evolve, and existing research quickly becomes obsolete. It is imperative that smart contract developers stay up to date on the current most damaging vulnerabilities and countermeasures to ensure the security of users' funds, and to collectively ensure the future of Ethereum as a financial settlement layer. This research work focuses on two smart contract vulnerabilities: transaction-ordering dependency and oracle manipulation. Combined, these two vulnerabilities have been exploited to extract hundreds of millions of dollars from smart contracts in the past year (2020-2021). For each of them, this paper presents: (1) a literary survey from recent (as of 2021) formal and informal sources; (2) a reproducible experiment as code demonstrating the vulnerability and, where applicable, countermeasures to mitigate the vulnerability; and (3) analysis and discussion on proposed countermeasures. To conclude, strengths, weaknesses and trade-offs of these countermeasures are summarised, inspiring directions for future research. ...