AudiDoS

Real-time denial-of-service adversarial attacks on deep audio models

Conference Paper (2019)
Author(s)

Taesik Gong (Korea Advanced Institute of Science and Technology)

Alberto Gil C.P. Ramos (Nokia Bell Labs)

Sourav Bhattacharya (Nokia Bell Labs)

Akhil Mathur (Nokia Bell Labs)

Fahim Kawsar (Nokia Bell Labs, TU Delft - Knowledge and Intelligence Design)

Knowledge and Intelligence Design
DOI related publication
https://doi.org/10.1109/ICMLA.2019.00167
More Info
expand_more
Publication Year
2019
Language
English
Knowledge and Intelligence Design
Pages (from-to)
978-985
ISBN (electronic)
978-1-7281-4549-5

Abstract

Deep learning has enabled personal and IoT devices to rethink microphones as a multi-purpose sensor for understanding conversation and the surrounding environment. This resulted in a proliferation of Voice Controllable Systems (VCS) around us. The increasing popularity of such systems is also prone to attracting miscreants, who often want to take advantage of the VCS without the knowledge of the user. Consequently, understanding the robustness of VCS, especially under adversarial attacks, has become an important research topic. Although there exists some previous work on audio adversarial attacks, their scopes are limited to embedding the attacks onto pre-recorded music clips, which when played through speakers cause VCS to misbehave. As an attack-audio needs to be played, the occurrence of this type of attacks can be suspected by a human listener. In this paper, we focus on audio-based Denial-of-Service (DoS) attack, which is unexplored in the literature. Contrary to previous work, we show that adversarial audio attacks in real-time and overthe-air are possible, while a user interacts with VCS. We show that the attacks are effective regardless of the user's command and interaction timings. In this paper, we present a first-of-itskind imperceptible and always-on universal audio perturbation technique that enables such DoS attack to be successful. We thoroughly evaluate the performance of the attacking scheme across (i) two learning tasks, (ii) two model architectures and (iii) three datasets. We demonstrate that the attack can introduce as high as 78% error rate in audio recognition tasks.

No files available

Metadata only record. There are no files for this record.