Backdoor Attacks in Neural Networks

Master Thesis (2021)
Author(s)

S. Koffas (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Stjepan Picek – Mentor (TU Delft - Cyber Security)

Reginald L. Lagendijk – Graduation committee member (TU Delft - Cyber Security)

Myrthe L. Tielman – Graduation committee member (TU Delft - Interactive Intelligence)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2021 Stefanos Koffas
More Info
expand_more
Publication Year
2021
Language
English
Copyright
© 2021 Stefanos Koffas
Graduation Date
04-06-2021
Awarding Institution
Delft University of Technology
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Deep learning has made tremendous success in the past decade. As a result, it is becoming widely deployed in various safety and security-critical applications like autonomous driving, malware detection, fingerprint identification, and financial fraud detection. It was recently shown that deep neural networks are susceptible to multiple attacks and pose a serious security threat for such systems. Such an attack is the backdoor attack. In this attack, the model's training is partially or fully outsourced to a malicious party. The adversary designs a secret property, the trigger, and embeds it into a small fraction of the training data. The trained model associates this property with a specific (chosen by the adversary) target label. The backdoored model behaves well on regular inputs, but a misclassification will occur with a very high probability when the trigger is present. The advent of this attack sparked an arms race in the deep learning community resulting in numerous backdoor attacks and defenses during the last years. In this thesis, we conduct a systematic evaluation that can push our knowledge further, aiming at more effective defenses in the future for various applications like image classification, natural language processing, and sound recognition. We show that the trigger's size is positively correlated to the attack success rate in almost all cases. On the contrary, the trigger's position is not always connected to the attack success rate and depends on the used neural network. Furthermore, we are the first to experiment with inaudible triggers in sound recognition and show that they can be used for stealthy real-world attacks. Moreover, we show that backdoor attacks could be a framework that further pushes our knowledge of how deep neural networks learn.
%Most of these attacks focus on computer vision, and only a few experiment with natural language processing and sound recognition. Additionally, most of the defenses are empirical and can be bypassed by slightly altering the attack scenario. Additionally, we exploited global average pooling's properties aiming at a more effective attack. In particular, we created dynamic backdoors that can be effective even if different trigger positions are used during training and inference without poisoning more data or altering the poisoning procedure. The exploration of this layer also showed that when a model generalizes well on new inputs, it is less probable to be susceptible to backdoor attacks.

Files

Thesis_stefanos_koffas.pdf
(pdf | 28.3 Mb)
License info not available