EmoBack

Backdoor Attacks Against Speaker Identification Using Emotional Prosody

Conference Paper (2024)
Authors

Coen Schoof (Radboud Universiteit Nijmegen)

Stefanos Koffas (TU Delft - Cyber Security)

Mauro Conti (UniversitĂ  degli Studi di Padova)

S. Picek (TU Delft - Cyber Security, Radboud Universiteit Nijmegen)

Research Group
Cyber Security
To reference this document use:
https://doi.org/10.1145/3689932.3694773
More Info
expand_more
Publication Year
2024
Language
English
Research Group
Cyber Security
Pages (from-to)
137-148
ISBN (electronic)
979-8-4007-1228-9
DOI:
https://doi.org/10.1145/3689932.3694773
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Speaker identification (SI) determines a speaker's identity based on their utterances. Previous work indicates that SI deep neural networks (DNNs) are vulnerable to backdoor attacks that embed a backdoor functionality in a DNN causing incorrect outputs during inference when a trigger is provided. This is the first work exploring SI DNNs' vulnerability to backdoor attacks using speakers' emotional prosody, resulting in dynamic, inconspicuous triggers. We used three datasets and three DNN architectures to determine the impact of using emotions as backdoor triggers on the accuracy of SI DNNs. Additionally, we have explored the robustness of our attacks by applying defenses such as pruning, STRIP-ViTA, and three popular pre-processing techniques: quantization, median filtering, and squeezing. We show that the aforementioned models are prone to our attack (EmoBack), indicating that emotional triggers (i.e., the most effective being neutral, sad, angry, and surprised prosody) can be effectively used to compromise the integrity of SI DNNs. However, our pruning experiments suggest potential ways to reinforce backdoored models against our attacks across multiple emotions, decreasing the attack success rate up to 41.4%.