Imperceptible Backdoor Attacks on Deep Regression Using the WaNet Method

Using Warping-Based Poisoned Networks to Covertly Compromise a Deep Regression Model

More Info
expand_more

Abstract

Deep Regression Models (DRMs) are a subset of deep learning models that output continuous values. Due to their performance, DRMs are widely used as critical components in various systems. As training a DRM is resource-intensive, many rely on pre-trained third-party models, which can leave a worrying amount of systems vulenerable to backdoor attacks. A backdoored model is an otherwise legitimate model that maliciously changes its behaviour whenever a predetermined backdoor trigger is present. While numerous works on backdoor attacks on deep learning models focus on classification problems, very little work has focused on DRMs. We formulate and evaluate a backdoor attack on a DRM using WaNet, a method that relies on warping-based triggers that are difficult to detect by both human and machine defence methods. We successfully train a backdoored (poisoned) DRM with the backdoor working for both grayscale and coloured inputs. Further experiments show that the malicious backdoor behaviour can be subdued by fine-tuning the poisoned model.