SK

S. Koffas

info

Please Note

20 records found

Focused backdoor attack against federated transfer learning

Journal article (2026) - Marco Arazzi, Stefanos Koffas, Antonino Nocera, Stjepan Picek
Federated Transfer Learning (FTL) is the most general form of Federated Learning (FL). In FTL, one party, usually the server, pre-trains a feature extractor on public data. Then, clients collaboratively train a classifier by updating only the classification layers on their private data. This raises doubts about whether local poisoning attacks can effectively backdoor the full model. Unlike in FL, where attackers can shift model attention via poisoned inputs, FTL's fixed feature extractor, set during server pre-training, limits this possibility. In this paper, we investigate this scenario to identify and exploit a vulnerability obtained by combining eXplainable AI (XAI) and dataset distillation. Our proposed attack can be carried out by one of the clients during the FL phase of FTL by identifying the optimal position for the trigger through XAI and encapsulating compressed information of the backdoor class. Due to its behavior, we refer to our approach as a focused backdoor approach (FB-FTL for short) and test its performance by referencing image and text classification scenarios. Our attack is effective against existing defenses for FL, as it achieves an average of 80% attack success rate. ...
Conference paper (2026) - Behrad Tajalli, S. Koffas, S. Picek
Backdoor attacks in machine learning have drawn significant attention for their potential to compromise models stealthily, yet most research has focused on homogeneous data such as images. In this work, we propose a novel backdoor attack on tabular data, which is particularly challenging due to the presence of both numerical and categorical features. Our key idea is a novel technique to convert categorical values into floating-point representations. This approach preserves enough information to maintain clean-model accuracy compared to traditional methods like one-hot or ordinal encoding. By doing this, we create a gradient-based universal perturbation that applies to all features, including categorical ones.

We evaluate our method on five datasets and four popular models. Our results show up to a 100% attack success rate in both white-box and black-box settings (including real-world applications like Vertex AI), revealing a severe vulnerability for tabular data. Our method is shown to surpass previous work like Tabdoor in terms of performance, while remaining stealthy against state-of-the-art defense mechanisms. We evaluate our attack against Spectral Signatures, Neural Cleanse, Beatrix, and Fine-Pruning, all of which fail to defend successfully against it. We also verify that our attack successfully bypasses popular outlier detection mechanisms. ...
Conference paper (2025) - Xiaoyun Xu, Zhuoran Liu, Stefanos Koffas, Stjepan Picek
Backdoor attacks maliciously inject covert functionality into machine learning models, representing a security threat. The stealthiness of backdoor attacks is a critical research direction, focusing on adversaries' efforts to enhance the resistance of backdoor attacks against defense mechanisms. Recent research on backdoor stealthiness focuses mainly on indistinguishable triggers in input space and inseparable backdoor representations in feature space, aiming to circumvent backdoor defenses that examine these respective spaces. However, existing backdoor attacks are typically designed to resist a specific type of backdoor defense without considering the diverse range of defense mechanisms. Based on this observation, we pose a natural question: Are current backdoor attacks truly a real-world threat when facing diverse practical defenses? To answer this question, we examine 12 common backdoor attacks that focus on input-space or feature-space stealthiness and 17 diverse representative defenses. Surprisingly, we reveal a critical blind spot that backdoor attacks designed to be stealthy in input and feature spaces can be mitigated by examining backdoored models in parameter space. To investigate the underlying causes behind this common vulnerability, we study the characteristics of backdoor attacks in the parameter space. Notably, we find that input- and feature-space attacks introduce prominent backdoor-related neurons in parameter space, which are not thoroughly considered by current backdoor attacks. Taking comprehensive stealthiness into account, we propose a novel supply-chain attack called Grond. Grond limits the parameter changes by a simple yet effective module, Adversarial Backdoor Injection (ABI), which adaptively increases the parameter-space stealthiness during the backdoor injection. Extensive experiments demonstrate that Grond outperforms all 12 backdoor attacks against state-of-the-art (including adaptive) defenses on CIFAR10, GTSRB, and a subset of ImageNet. Additionally, we show that ABI consistently improves the effectiveness of common backdoor attacks. ...
Journal article (2025) - Jona te Lintelo, S. Koffas, S. Picek
Sponge attacks aim to increase the energy consumption and computation time of neural networks. In this work, we present a novel sponge attack called SkipSponge. SkipSponge is the first sponge attack that is performed directly on the parameters of a pretrained model using only a few data samples. Our experiments show that SkipSponge can successfully increase the energy consumption of image classification models, GANs, and autoencoders, requiring fewer samples than state-of-the-art sponge attacks (Sponge Poisoning).

We show that poisoning defenses are ineffective if not adjusted specifically for defense against SkipSponge (i.e., they decrease target layer bias values) and that SkipSponge is more effective on GANs and autoencoders than Sponge Poisoning. Additionally, SkipSponge is stealthy, as it does not require significant changes to the victim model’s parameters. Our experiments indicate that SkipSponge can be performed even when an attacker has access to less than 1% of the entire training dataset and reaches up to a 13% energy increase. ...
Conference paper (2025) - Simen Gaure, S. Koffas, S. Picek, Sondre Rønjom
Large Language Models (LLMs) are susceptible to various attacks but can also improve the security of diverse systems. However, how well do open source LLMs behave as covertext distributions to, e.g., facilitate censorship-resistant communication? In this paper, we explore open-source LLM-based covert channels. We empirically measure the security vs. capacity of two open-source LLM models (Llama-7B and GPT-2) to assess their performance as covert channels. Although our results indicate that such channels are not likely to achieve high practical bitrates, we also show that the chance for an adversary to detect covert communication is low. To ensure our results can be used with the least effort as a general reference, we employ a conceptually simple and concise scheme and only assume public models. ...
Conference paper (2024) - Christina Kreza, Stefanos Koffas, Behrad Tajalli, Mauro Conti, Stjepan Picek
Recently, attackers have targeted machine learning systems, introducing various attacks. The backdoor attack is popular in this field and is usually realized through data poisoning. To the best of our knowledge, we are the first to investigate whether the backdoor attacks remain effective when manifold learning algorithms are applied to the poisoned dataset. We conducted our experiments using two manifold learning techniques (Autoencoder and UMAP) on two benchmark datasets (MNIST and CIFAR10) and two backdoor strategies (clean and dirty label). We performed an array of experiments using different parameters, finding that we could reach an attack success rate of 95% and 75% even after reducing our data to two dimensions using Autoencoders and UMAP, respectively. ...

A Study on Backdoor Attacks on Extreme Learning Machines

Conference paper (2024) - Behrad Tajalli, Stefanos Koffas, Gorka Abad, Stjepan Picek
Due to their computational efficiency and speed during training and inference, extreme learning machines are suitable for simple learning tasks on lightweight datasets. Examples of their real-world applications include healthcare and edge devices, where security concerns are crucial to be examined. Backdoor attacks are among the most common security threats against machine learning models but are almost completely unexplored for extreme learning machines.

This paper investigates the effects of backdoor attacks on extreme learning machines. First, we inject the backdoor into the model through data and model poisoning and then examine the pruning technique as a defense to defend against the attack. The core characteristic of extreme learning machines, which makes them interesting for study, is their different structure and learning procedure compared to deep neural networks. These features raise the question of whether they are as vulnerable to backdoor attacks as deep neural networks. Our experiments confirm this assumption and indicate that extreme learning machines can be backdoored with 100% attack success rate. Thus, we believe further study is needed to develop a robust defense technique as a solution to make them less vulnerable. ...

Backdoor Attacks Against Speaker Identification Using Emotional Prosody

Conference paper (2024) - Coen Schoof, Stefanos Koffas, Mauro Conti, Stjepan Picek
Speaker identification (SI) determines a speaker's identity based on their utterances. Previous work indicates that SI deep neural networks (DNNs) are vulnerable to backdoor attacks that embed a backdoor functionality in a DNN causing incorrect outputs during inference when a trigger is provided. This is the first work exploring SI DNNs' vulnerability to backdoor attacks using speakers' emotional prosody, resulting in dynamic, inconspicuous triggers. We used three datasets and three DNN architectures to determine the impact of using emotions as backdoor triggers on the accuracy of SI DNNs. Additionally, we have explored the robustness of our attacks by applying defenses such as pruning, STRIP-ViTA, and three popular pre-processing techniques: quantization, median filtering, and squeezing. We show that the aforementioned models are prone to our attack (EmoBack), indicating that emotional triggers (i.e., the most effective being neutral, sad, angry, and surprised prosody) can be effectively used to compromise the integrity of SI DNNs. However, our pruning experiments suggest potential ways to reinforce backdoored models against our attacks across multiple emotions, decreasing the attack success rate up to 41.4%. ...

Enhancing Sponge Attack on Object Detection Models

Conference paper (2024) - Coen Schoof, Stefanos Koffas, Mauro Conti, Stjepan Picek
Given today's ongoing deployment of deep learning models, ensuring their security against adversarial attacks has become paramount. This paper introduces an enhanced version of the PhantomSponges attack by Shapira et al. The attack exploits the non-maximum suppression (NMS) algorithm in YOLO object detection (OD) models without compromising OD, substantially increasing inference time. Our enhancement focuses on improving the attack's impact on YOLOv5 models by modifying its bounding box area loss term, aiming to directly decrease the intersection over union and, thus, exacerbate the computational load on NMS. Through a parameter study using the Berkeley Deep Drive dataset, we evaluate the enhanced attack's efficacy against various sizes of YOLOv5, demonstrating, under certain circumstances, an improved capability to increase NMS time with a minimal loss in OD accuracy. Furthermore, we propose a novel defense that dynamically resizes input images to mitigate the attack's effectiveness, showcasing a substantial restoration in inference speed and OD accuracy. Our findings show that the enhanced attack could result in a 550% increase in NMS time on the YOLOv5 small configuration. Moreover, our defense's results show a substantial decrease of 90.18% in NMS execution time when applied to an attacked YOLOv5 large model. ...

Detecting Backdoors Activated by Adversarial Neuron Noise

Conference paper (2024) - Xiaoyun Xu, Zhuoran Liu, Stefanos Koffas, Shujian Yu, Stjepan Picek
Backdoor attacks on deep learning represent a recent threat that has gained significant attention in the research community. Backdoor defenses are mainly based on backdoor inversion, which has been shown to be generic, model-agnostic, and applicable to practical threat scenarios. State-of-the-art backdoor inversion recovers a mask in the feature space to locate prominent backdoor features, where benign and backdoor features can be disentangled. However, it suffers from high computational overhead, and we also find that it overly relies on prominent backdoor features that are highly distinguishable from benign features. To tackle these shortcomings, this paper improves backdoor feature inversion for backdoor detection by incorporating extra neuron activation information. In particular, we adversarially increase the loss of backdoored models with respect to weights to activate the backdoor effect, based on which we can easily differentiate backdoored and clean models. Experimental results demonstrate our defense, BAN, is 1.37× (on CIFAR-10) and 5.11× (on ImageNet200) more efficient with an average 9.99% higher detect success rate than the state-of-the-art defense BTI-DBF. Our code and trained models are publicly available at https://github.com/xiaoyunxxy/ban. ...
Journal article (2024) - Hanbo Cai, Pengcheng Zhang, Hai Dong, Yan Xiao, Stefanos Koffas, Yiming Li
Deep neural networks (DNNs) have been widely and successfully adopted and deployed in various applications of speech recognition. Recently, a few works revealed that these models are vulnerable to backdoor attacks, where the adversaries can implant malicious prediction behaviors into victim models by poisoning their training process. In this paper, we revisit poison-only backdoor attacks against speech recognition. We reveal that existing methods are not stealthy since their trigger patterns are perceptible to humans or machine detection. This limitation is mostly because their trigger patterns are simple noises or separable and distinctive clips. Motivated by these findings, we propose to exploit elements of sound ( e.g ., pitch and timbre) to design more stealthy yet effective poison-only backdoor attacks. Specifically, we insert a short-duration high-pitched signal as the trigger and increase the pitch of remaining audio clips to ‘mask’ it for designing stealthy pitch-based triggers. We manipulate timbre features of victim audio to design the stealthy timbre-based attack and design a voiceprint selection module to facilitate the multi-backdoor attack. Our attacks can generate more ‘natural’ poisoned samples and therefore are more stealthy. Extensive experiments are conducted on benchmark datasets, which verify the effectiveness of our attacks under different settings ( e.g ., all-to-one, all-to-all, clean-label, physical, and multi-backdoor settings) and their stealthiness. Our methods achieve attack success rates of over 95% in most cases and are nearly undetectable. The code for reproducing main experiments are available at https://github.com/HanboCai/BadSpeech_SoE . ...

Investigating Distributed and Centralized Backdoor Attacks in Federated Graph Neural Networks

Journal article (2024) - Jing Xu, Stefanos Koffas, Stjepan Picek
Graph neural networks (GNNs) have gained significant popularity as powerful deep learning methods for processing graph data. However, centralized GNNs face challenges in data-sensitive scenarios due to privacy concerns and regulatory restrictions. Federated learning has emerged as a promising technology that enables collaborative training of a shared global model while preserving privacy. Although federated learning has been applied to train GNNs, no research focuses on the robustness of Federated GNNs against backdoor attacks.

This article bridges this research gap by investigating two types of backdoor attacks in Federated GNNs: centralized backdoor attack (CBA) and distributed backdoor attack (DBA). Through extensive experiments, we demonstrate that DBA exhibits a higher success rate than CBA across various scenarios. To further explore the characteristics of these backdoor attacks in Federated GNNs, we evaluate their performance under different scenarios, including varying numbers of clients, trigger sizes, poisoning intensities, and trigger densities. Additionally, we explore the resilience of DBA and CBA against two defense mechanisms. Our findings reveal that both defenses cannot eliminate DBA and CBA without affecting the original task. This highlights the necessity of developing tailored defenses to mitigate the novel threat of backdoor attacks in Federated GNNs. ...

Evaluating backdoor attacks and defenses in different domains

Outsourced training and crowdsourced datasets lead to a new threat for deep learning models: the backdoor attack. In this attack, the adversary inserts a secret functionality in a model, activated through malicious inputs. Backdoor attacks represent an active research area due to diverse settings where they represent a real threat. Still, there is no framework to evaluate existing attacks and defenses in different domains. Only a few toolboxes have been implemented, but most of them focus on computer vision and are difficult to use. To bridge this gap, we implement Backdoor Pony, a framework for evaluating attacks and defenses in different domains through a user-friendly GUI. ...
Book chapter (2023) - Stefanos Koffas, Behrad Tajalli, Jing Xu, Mauro Conti, Stjepan Picek
Deep learning found its place in various real-world applications, where many also have security requirements. Unfortunately, as these systems become more pervasive, understanding how they fail becomes more challenging. While there are multiple failure modes in machine learning, one category received significant attention in the last few years-backdoor attacks. Backdoor attacks aim to make a model misclassify some of its inputs to a preset-specific label while other classification results would behave normally. While many works investigate various backdoor attacks and defenses for different domains, no works aim to provide a systematic comparison of backdoor attacks for different scenarios. This work considers backdoor attacks in image, sound, text, and graph domains and provides a comparative analysis of their respective strengths. ...

Audio Backdoors Through Stylistic Transformations

Conference paper (2023) - Stefanos Koffas, Luca Pajola, Stjepan Picek, Mauro Conti
This work explores stylistic triggers for backdoor attacks in the audio domain: dynamic transformations of malicious samples through guitar effects. We first formalize stylistic triggers – currently missing in the literature. Second, we explore how to develop stylistic triggers in the audio domain by proposing JingleBack. Our experiments confirm the effectiveness of the attack, achieving a 96% attack success rate. Our code is available in https://github.com/skoffas/going-in-style. ...
Conference paper (2023) - Jing Xu, Stefanos Koffas, Oǧuzhan Ersoy, Stjepan Picek
Graph Neural Networks (GNNs) have achieved promising performance in various real-world applications. Building a powerful GNN model is not a trivial task, as it requires a large amount of training data, powerful computing resources, and human expertise. Moreover, with the development of adversarial attacks, e.g., model stealing attacks, GNNs raise challenges to model authentication. To avoid copyright infringement on GNNs, verifying the ownership of the GNN models is necessary.This paper presents a watermarking framework for GNNs for both graph and node classification tasks. We 1) design two strategies to generate watermarked data for the graph classification task and one for the node classification task, 2) embed the watermark into the host model through training to obtain the watermarked GNN model, and 3) verify the ownership of the suspicious model in a black-box setting. The experiments show that our framework can verify the ownership of GNN models with a very high probability (up to 99%) for both tasks. We also explore our watermarking mechanism against an adaptive attacker with access to partial knowledge of the watermarked data. Finally, we experimentally show that our watermarking approach is robust against a state-of-the-art model extraction technique and four state-of-the-art defenses against backdoor attacks. ...
Conference paper (2022) - Stefanos Koffas, Praveen Kumar Vadnala
We investigate the influence of clock frequency on the success rate of a fault injection attack. In particular, we examine the success rate of voltage and electromagnetic fault attacks for varying clock frequencies. Using three different tests that cover different components of a System-on-Chip, we perform fault injection while its CPU operates at different clock frequencies. Our results show that the attack’s success rate increases with an increase in clock frequency for both voltage and EM fault injection attacks. As the technology advances push the clock frequency further, these results can help assess the impact of fault injection attacks more accurately and develop appropriate countermeasures to address them. ...
Conference paper (2022) - Stefanos Koffas, Jing Xu, Mauro Conti, Stjepan Picek
This work explores backdoor attacks for automatic speech recognition systems where we inject inaudible triggers. By doing so, we make the backdoor attack challenging to detect for legitimate users and, consequently, potentially more dangerous. We conduct experiments on two versions of a speech dataset and three neural networks and explore the performance of our attack concerning the duration, position, and type of the trigger. Our results indicate that less than 1% of poisoned data is sufficient to deploy a backdoor attack and reach a 100% attack success rate. We observed that short, non-continuous triggers result in highly successful attacks. Still, since our trigger is inaudible, it can be as long as possible without raising any suspicions making the attack more effective. Finally, we conduct our attack on actual hardware and saw that an adversary could manipulate inference in an Android application by playing the inaudible trigger over the air. ...
Conference paper (2022) - Stefanos Koffas, Stjepan Picek, Mauro Conti
Outsourced training and machine learning as a service have resulted in novel attack vectors like backdoor attacks. Such attacks embed a secret functionality in a neural network activated when the trigger is added to its input. In most works in the literature, the trigger is static, both in terms of location and pattern. The effectiveness of various detection mechanisms depends on this property. It was recently shown that countermeasures in image classification, like Neural Cleanse and ABS, could be bypassed with dynamic triggers that are effective regardless of their pattern and location. Still, such backdoors are demanding as they require a large percentage of poisoned training data. In this work, we are the first to show that dynamic backdoor attacks could happen due to a global average pooling layer without increasing the percentage of the poisoned training data. Nevertheless, our experiments in sound classification, text sentiment analysis, and image classification show this to be very difficult in practice. ...
Conference paper (2022) - J. Xu, R. Wang, S. Koffas, K. Liang, S. Picek
Graph Neural Networks (GNNs) are a class of deep learning-based methods for processing graph domain information. GNNs have recently become a widely used graph analysis method due to their superior ability to learn representations for complex graph data. Due to privacy concerns and regulation restrictions, centralized GNNs can be difficult to apply to data-sensitive scenarios. Federated learning (FL) is an emerging technology developed for privacy-preserving settings when several parties need to train a shared global model collaboratively. Although several research works have applied FL to train GNNs (Federated GNNs), there is no research on their robustness to backdoor attacks.

This paper bridges this gap by conducting two types of backdoor attacks in Federated GNNs: centralized backdoor attacks (CBA) and distributed backdoor attacks (DBA). Our experiments show that the DBA attack success rate is higher than CBA in almost all cases. For CBA, the attack success rate of all local triggers is similar to the global trigger, even if the training set of the adversarial party is embedded with the global trigger. To explore the properties of two backdoor attacks in Federated GNNs, we evaluate the attack performance for a different number of clients, trigger sizes, poisoning intensities, and trigger densities. Finally, we explore the robustness of DBA and CBA against two state-of-the-art defenses. We find that both attacks are robust against the investigated defenses, necessitating the need to consider backdoor attacks in Federated GNNs as a novel threat that requires custom defenses. ...