Black-box Adversarial Attacks using Substitute models

Effects of Data Distributions on Sample Transferability

Bachelor Thesis (2022)
Author(s)

P.M. Vigilanza Lorenzo (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

S. Roos – Mentor (TU Delft - Data-Intensive Systems)

J. Huang – Mentor (TU Delft - Data-Intensive Systems)

C. Hong – Mentor (TU Delft - Data-Intensive Systems)

G. Lan – Graduation committee member (TU Delft - Embedded Systems)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2022 Pietro Vigilanza Lorenzo
More Info
expand_more
Publication Year
2022
Language
English
Copyright
© 2022 Pietro Vigilanza Lorenzo
Graduation Date
24-06-2022
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Machine Learning (ML) models are vulnerable to adversarial samples — human imperceptible changes to regular input to elicit wrong output on a given model. Plenty of adversarial attacks assume an attacker has access to the underlying model or access to the data used to train the model. Instead, in this paper we focus on the effects the data distributions has on the transferability of adversarial samples under a ``black-box'' scenario. We assume an attacker has to train a separate model (the ``substitute model'') and generate adversaries using this independent model. The substitute models are trained with different data distributions: symmetric, cross-section or completely disjoint data to the one used to train the target model. The results demonstrate that an attacker only needs semantically similar data to execute an effective attack using a substitute model and well-known gradient based adversarial generation techniques. Under ideal attack scenarios, target model accuracies can drop below 50\%. Furthermore, our research shows that generating adversarial images from an ensemble increases average attack success.

Files

Final_paper_pv.pdf
(pdf | 0.797 Mb)
License info not available