Imperceptible Backdoor Attacks on Deep Regression Using the WaNet Method
Using Warping-Based Poisoned Networks to Covertly Compromise a Deep Regression Model
A.A. Styslavski (TU Delft - Electrical Engineering, Mathematics and Computer Science)
L. Du – Mentor (TU Delft - Embedded Systems)
L. A.N. Guohao – Mentor (TU Delft - Embedded Systems)
Sicco Verwer – Graduation committee member (TU Delft - Algorithmics)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Deep Regression Models (DRMs) are a subset of deep learning models that output continuous values. Due to their performance, DRMs are widely used as critical components in various systems. As training a DRM is resource-intensive, many rely on pre-trained third-party models, which can leave a worrying amount of systems vulenerable to backdoor attacks. A backdoored model is an otherwise legitimate model that maliciously changes its behaviour whenever a predetermined backdoor trigger is present. While numerous works on backdoor attacks on deep learning models focus on classification problems, very little work has focused on DRMs. We formulate and evaluate a backdoor attack on a DRM using WaNet, a method that relies on warping-based triggers that are difficult to detect by both human and machine defence methods. We successfully train a backdoored (poisoned) DRM with the backdoor working for both grayscale and coloured inputs. Further experiments show that the malicious backdoor behaviour can be subdued by fine-tuning the poisoned model.