S. Picek
Please Note
116 records found
1
Let's focus
Focused backdoor attack against federated transfer learning
Federated Transfer Learning (FTL) is the most general form of Federated Learning (FL). In FTL, one party, usually the server, pre-trains a feature extractor on public data. Then, clients collaboratively train a classifier by updating only the classification layers on their private data. This raises doubts about whether local poisoning attacks can effectively backdoor the full model. Unlike in FL, where attackers can shift model attention via poisoned inputs, FTL's fixed feature extractor, set during server pre-training, limits this possibility. In this paper, we investigate this scenario to identify and exploit a vulnerability obtained by combining eXplainable AI (XAI) and dataset distillation. Our proposed attack can be carried out by one of the clients during the FL phase of FTL by identifying the optimal position for the trigger through XAI and encapsulating compressed information of the backdoor class. Due to its behavior, we refer to our approach as a focused backdoor approach (FB-FTL for short) and test its performance by referencing image and text classification scenarios. Our attack is effective against existing defenses for FL, as it achieves an average of 80% attack success rate.
We evaluate our method on five datasets and four popular models. Our results show up to a 100% attack success rate in both white-box and black-box settings (including real-world applications like Vertex AI), revealing a severe vulnerability for tabular data. Our method is shown to surpass previous work like Tabdoor in terms of performance, while remaining stealthy against state-of-the-art defense mechanisms. We evaluate our attack against Spectral Signatures, Neural Cleanse, Beatrix, and Fine-Pruning, all of which fail to defend successfully against it. We also verify that our attack successfully bypasses popular outlier detection mechanisms. ...
We evaluate our method on five datasets and four popular models. Our results show up to a 100% attack success rate in both white-box and black-box settings (including real-world applications like Vertex AI), revealing a severe vulnerability for tabular data. Our method is shown to surpass previous work like Tabdoor in terms of performance, while remaining stealthy against state-of-the-art defense mechanisms. We evaluate our attack against Spectral Signatures, Neural Cleanse, Beatrix, and Fine-Pruning, all of which fail to defend successfully against it. We also verify that our attack successfully bypasses popular outlier detection mechanisms.
Your PIN is Mine
Uncovering Users' PINs at Point of Sale Machines
Point of Sale (PoS) machines have become extremely popular recently. In many economies, most transactions occur using them. Although PoS technology is evolving, PINs are still heavily used. In this paper, we perform a large-scale study to understand how difficult it is to uncover user PINs at PoS, even when the users cover the pad with their hands. Our study involves 142 participants, two types of PoS, and around 13,800 PINs. We develop machine learning techniques to infer PoS PINs by using hidden cameras. Our results show that uncovering PINs in PoS is more complex than in other cases where a user PIN is used, e.g., ATMs, because of the small pad area of PoS. Nevertheless, we could achieve more than 50% Top-3 accuracy for 4-digit PINs and 45% Top-3 accuracy for 5-digit PINs, even when the PIN is covered by the user's hand. We comment on the impact of the camera's position and PoS on the successful inference of the user's PINs. We also comment on the hardness of inferring PINs depending on the physical distance of digits and recommend what are good practices to generate PINs and cover PoS to make PIN inference difficult.
Still Making Noise
Improving Deep-Learning-Based Side-Channel Analysis
Editor’s notes: Side-channel attacks have been undermining cryptosystems for almost three decades. Advances in machine learning techniques have shown great promise in improving the performance and efficiency of side-channel attacks, even on systems with countermeasures. This article provides a systematic approach to applying ML techniques for side-channel attacks.
Membership Inference Attacks (MIAs) infer whether a data point is in the training data of a machine learning model, posing privacy risks to sensitive data like medical records or financial data. Intuitively, data points that MIA accurately detects are vulnerable. Those data points may exist in the data of different target models, each susceptible to multiple MIAs. As such, the vulnerability of data points under multiple MIAs and target models represents a significant challenge. This article defines several metrics reflecting data points’ vulnerability and capturing vulnerable data points under multiple MIAs and target models. We implement 77 MIAs, with an average attack accuracy over target models ranging from 0.5 to 0.9, to support our analysis with our scalable and flexible platform, Various Membership Inference Attacks Platform (VMIAP). Based on the results, we observe that MIA has an inference tendency to some data points despite a low overall inference performance. Furthermore, previous approaches are unsuitable for finding vulnerable data points under multiple MIAs and target models. Finally, we explore the impact of retraining target, shadow, and attack models separately on the vulnerability of data points.
Breaking the Blindfold
Deep Learning-based Blind Side-channel Analysis
Physical side-channel analysis (SCA) operates on the foundational assumption of access to known plaintext or ciphertext. However, this assumption can be easily invalidated in various scenarios, ranging from common encryption modes like Offset CodeBook (OCB) to complex hardware implementations, where such data may be inaccessible. Blind SCA addresses this challenge by operating without the knowledge of plaintext or ciphertext. Unfortunately, prior such approaches have shown limited success in practical settings. This paper introduces the Deep Learning-based Blind Side-channel Analysis (DL-BSCA) framework, leveraging deep neural networks to recover secret keys in blind SCA settings. In addition, we propose a novel labeling method, Multi-point Cluster-based (MC) labeling, accounting for dependencies between leakage variables by exploiting multiple sample points for each variable, improving the accuracy of trace labeling. We validate our approach across four datasets, including symmetric key algorithms (AES and ASCON) and a post-quantum cryptography algorithm, Kyber, with platforms ranging from high-leakage 8-bit AVR XMEGA to noisy 32-bit ARM STM32F4. Notably, previous methods failed to recover the key on the same datasets. We demonstrate the first successful blind SCA on a desynchronization countermeasure enabled by DL-BSCA and MC labeling. All experiments are validated with real-world SCA measurements, highlighting the practicality and effectiveness of our approach.
Large Language Models (LLMs) are susceptible to various attacks but can also improve the security of diverse systems. However, how well do open source LLMs behave as covertext distributions to, e.g., facilitate censorship-resistant communication? In this paper, we explore open-source LLM-based covert channels. We empirically measure the security vs. capacity of two open-source LLM models (Llama-7B and GPT-2) to assess their performance as covert channels. Although our results indicate that such channels are not likely to achieve high practical bitrates, we also show that the chance for an adversary to detect covert communication is low. To ensure our results can be used with the least effort as a general reference, we employ a conceptually simple and concise scheme and only assume public models.
It’s a Kind of Magic
A Novel Conditional GAN Framework for Efficient Profiling Side-Channel Analysis
Profiling side-channel analysis (SCA) is widely used to evaluate the security of cryptographic implementations under worst-case attack scenarios. This method assumes a strong adversary with a fully controlled device clone, known as a profiling device, with full access to the internal state of the target algorithm, including the mask shares. However, acquiring such a profiling device in the real world is challenging, as secure products enforce strong life cycle protection, particularly on devices that allow the user partial (e.g., debug mode) or full (e.g., test mode) control. This enforcement restricts access to profiling devices, significantly reducing the effectiveness of profiling SCA. To address this limitation, this paper introduces a novel framework that allows an attacker to create and learn from their own white-box reference design without needing privileged access on the profiling device. Specifically, the attacker first implements the target algorithm on a different type of device with full control. Since this device is a white box to the attacker, they can access all internal states and mask shares. A novel conditional generative adversarial network (CGAN) framework is then introduced to mimic the feature extraction procedure from the reference device and transfer this experience to extract high-order leakages from the target device. These extracted features then serve as inputs for profiled SCA. Experiments show that our approach significantly enhances the efficacy of black-box profiling SCA, matching or potentially exceeding the results of worst-case security evaluations. Compared with conventional profiling SCA, which has strict requirements on the profiling device, our framework relaxes this threat model and, thus, can be better adapted to real-world attacks.
MUDGUARD
Taming Malicious Majorities in Federated Learning using Privacy-preserving Byzantine-robust Clustering
Byzantine-robust Federated Learning (FL) aims to counter malicious clients and train an accurate global model while maintaining an extremely low attack success rate. Most existing systems, however, are only robust when most of the clients are honest. FLTrust (NDSS '21) and Zeno++ (ICML '20) do not make such an honest majority assumption but can only be applied to scenarios where the server is provided with an auxiliary dataset used to filter malicious updates. FLAME (USENIX '22) and EIFFeL (CCS '22) maintain the semi-honest majority assumption to guarantee robustness and the confidentiality of updates. It is therefore currently impossible to ensure Byzantine robustness and confidentiality of updates without assuming a semi-honest majority. To tackle this problem, we propose a novel Byzantine-robust and privacy-preserving FL system, called MUDGUARD, to capture malicious minority and majority for server and client sides, respectively. Our experimental results demonstrate that the accuracy of MUDGUARD is practically close to the FL baseline using FedAvg without attacks (approximate 0.8% gap on average). Meanwhile, the attack success rate is around 0%-5% even under an adaptive attack tailored to MUDGUARD. We further optimize our design by using binary secret sharing and polynomial transformation leading to communication overhead and runtime decreases of 67%-89.17% and 66.05%-68.75%, respectively.
I Choose You
Automated Hyperparameter Tuning for Deep Learning-based Side-channel Analysis
The use of deep learning-based side-channel analysis is an effective way of performing profiling attacks on power and electromagnetic leakages, even against targets protected with countermeasures. While many research articles have reported successful results, they typically focus on profiling and attacking a single device, assuming that leakages are similar between devices of the same type. However, this assumption is not always realistic due to variations in hardware and measurement setups, creating what is known as the portability problem. Profiling multiple devices has been proposed as a solution, but obtaining access to these devices may pose a challenge for attackers. This article proposes a new approach to overcome the portability problem by introducing a neural network layer assessment methodology based on the ablation paradigm. This methodology evaluates the sensitivity and resilience of each layer, providing valuable knowledge to create a Multiple Device Model from Single Device (MDMSD). Specifically, it involves ablating a specific neural network section and performing recovery training. As a result, the profiling model, trained initially on a single device, can be generalized to leakage traces measured from various devices. By addressing the portability problem through a single device, practical side-channel attacks could be more accessible and effective for attackers.
Recently, attackers have targeted machine learning systems, introducing various attacks. The backdoor attack is popular in this field and is usually realized through data poisoning. To the best of our knowledge, we are the first to investigate whether the backdoor attacks remain effective when manifold learning algorithms are applied to the poisoned dataset. We conducted our experiments using two manifold learning techniques (Autoencoder and UMAP) on two benchmark datasets (MNIST and CIFAR10) and two backdoor strategies (clean and dirty label). We performed an array of experiments using different parameters, finding that we could reach an attack success rate of 95% and 75% even after reducing our data to two dimensions using Autoencoders and UMAP, respectively.
EmoBack
Backdoor Attacks Against Speaker Identification Using Emotional Prosody
Beyond PhantomSponges
Enhancing Sponge Attack on Object Detection Models
Given today's ongoing deployment of deep learning models, ensuring their security against adversarial attacks has become paramount. This paper introduces an enhanced version of the PhantomSponges attack by Shapira et al. The attack exploits the non-maximum suppression (NMS) algorithm in YOLO object detection (OD) models without compromising OD, substantially increasing inference time. Our enhancement focuses on improving the attack's impact on YOLOv5 models by modifying its bounding box area loss term, aiming to directly decrease the intersection over union and, thus, exacerbate the computational load on NMS. Through a parameter study using the Berkeley Deep Drive dataset, we evaluate the enhanced attack's efficacy against various sizes of YOLOv5, demonstrating, under certain circumstances, an improved capability to increase NMS time with a minimal loss in OD accuracy. Furthermore, we propose a novel defense that dynamically resizes input images to mitigate the attack's effectiveness, showcasing a substantial restoration in inference speed and OD accuracy. Our findings show that the enhanced attack could result in a 550% increase in NMS time on the YOLOv5 small configuration. Moreover, our defense's results show a substantial decrease of 90.18% in NMS execution time when applied to an attacked YOLOv5 large model.
Unveiling the Threat
Investigating Distributed and Centralized Backdoor Attacks in Federated Graph Neural Networks
This article bridges this research gap by investigating two types of backdoor attacks in Federated GNNs: centralized backdoor attack (CBA) and distributed backdoor attack (DBA). Through extensive experiments, we demonstrate that DBA exhibits a higher success rate than CBA across various scenarios. To further explore the characteristics of these backdoor attacks in Federated GNNs, we evaluate their performance under different scenarios, including varying numbers of clients, trigger sizes, poisoning intensities, and trigger densities. Additionally, we explore the resilience of DBA and CBA against two defense mechanisms. Our findings reveal that both defenses cannot eliminate DBA and CBA without affecting the original task. This highlights the necessity of developing tailored defenses to mitigate the novel threat of backdoor attacks in Federated GNNs. ...
This article bridges this research gap by investigating two types of backdoor attacks in Federated GNNs: centralized backdoor attack (CBA) and distributed backdoor attack (DBA). Through extensive experiments, we demonstrate that DBA exhibits a higher success rate than CBA across various scenarios. To further explore the characteristics of these backdoor attacks in Federated GNNs, we evaluate their performance under different scenarios, including varying numbers of clients, trigger sizes, poisoning intensities, and trigger densities. Additionally, we explore the resilience of DBA and CBA against two defense mechanisms. Our findings reveal that both defenses cannot eliminate DBA and CBA without affecting the original task. This highlights the necessity of developing tailored defenses to mitigate the novel threat of backdoor attacks in Federated GNNs.
BAN
Detecting Backdoors Activated by Adversarial Neuron Noise
Backdoor attacks on deep learning represent a recent threat that has gained significant attention in the research community. Backdoor defenses are mainly based on backdoor inversion, which has been shown to be generic, model-agnostic, and applicable to practical threat scenarios. State-of-the-art backdoor inversion recovers a mask in the feature space to locate prominent backdoor features, where benign and backdoor features can be disentangled. However, it suffers from high computational overhead, and we also find that it overly relies on prominent backdoor features that are highly distinguishable from benign features. To tackle these shortcomings, this paper improves backdoor feature inversion for backdoor detection by incorporating extra neuron activation information. In particular, we adversarially increase the loss of backdoored models with respect to weights to activate the backdoor effect, based on which we can easily differentiate backdoored and clean models. Experimental results demonstrate our defense, BAN, is 1.37× (on CIFAR-10) and 5.11× (on ImageNet200) more efficient with an average 9.99% higher detect success rate than the state-of-the-art defense BTI-DBF. Our code and trained models are publicly available at https://github.com/xiaoyunxxy/ban.
Backdoor Pony
Evaluating backdoor attacks and defenses in different domains
Outsourced training and crowdsourced datasets lead to a new threat for deep learning models: the backdoor attack. In this attack, the adversary inserts a secret functionality in a model, activated through malicious inputs. Backdoor attacks represent an active research area due to diverse settings where they represent a real threat. Still, there is no framework to evaluate existing attacks and defenses in different domains. Only a few toolboxes have been implemented, but most of them focus on computer vision and are difficult to use. To bridge this gap, we implement Backdoor Pony, a framework for evaluating attacks and defenses in different domains through a user-friendly GUI.
The Need for Speed
A Fast Guessing Entropy Calculation for Deep Learning-Based SCA
The adoption of deep neural networks for profiling side-channel attacks opened new perspectives for leakage detection. Recent publications showed that cryptographic implementations featuring different countermeasures could be broken without feature selection or trace preprocessing. This success comes with a high price: an extensive hyperparameter search to find optimal deep learning models. As deep learning models usually suffer from overfitting due to their high fitting capacity, it is crucial to avoid over-training regimes, which require a correct number of epochs. For that, early stopping is employed as an efficient regularization method that requires a consistent validation metric. Although guessing entropy is a highly informative metric for profiling side-channel attacks, it is time-consuming, especially if computed for all epochs during training, and the number of validation traces is significantly large. This paper shows that guessing entropy can be efficiently computed during training by reducing the number of validation traces without affecting the efficiency of early stopping decisions. Our solution significantly speeds up the process, impacting the performance of the hyperparameter search and overall profiling attack. Our fast guessing entropy calculation is up to 16× faster, resulting in more hyperparameter tuning experiments and allowing security evaluators to find more efficient deep learning models.