Black-box Adversarial Attacks using Substitute models

None, None

Black-box Adversarial Attacks using Substitute models

Effects of Data Distributions on Sample Transferability

Bachelor Thesis (2022)

Author(s)

P.M. Vigilanza Lorenzo (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

S. Roos – Mentor (TU Delft - Data-Intensive Systems)

J. Huang – Mentor (TU Delft - Data-Intensive Systems)

C. Hong – Mentor (TU Delft - Data-Intensive Systems)

G. Lan – Graduation committee member (TU Delft - Embedded Systems)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Machine learning Adversarial attacks Python Semantic Similarity Pytorch

To reference this document use:

https://resolver.tudelft.nl/uuid:bcbc50b1-479a-4738-89dc-645456cffd82

More Info

expand_more

Publication Year

2022

Language

English

Copyright

Graduation Date

24-06-2022

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Machine Learning (ML) models are vulnerable to adversarial samples — human imperceptible changes to regular input to elicit wrong output on a given model. Plenty of adversarial attacks assume an attacker has access to the underlying model or access to the data used to train the model. Instead, in this paper we focus on the effects the data distributions has on the transferability of adversarial samples under a ``black-box'' scenario. We assume an attacker has to train a separate model (the ``substitute model'') and generate adversaries using this independent model. The substitute models are trained with different data distributions: symmetric, cross-section or completely disjoint data to the one used to train the target model. The results demonstrate that an attacker only needs semantically similar data to execute an effective attack using a substitute model and well-known gradient based adversarial generation techniques. Under ideal attack scenarios, target model accuracies can drop below 50\%. Furthermore, our research shows that generating adversarial images from an ensemble increases average attack success.

Files

Final_paper_pv.pdf

(pdf | 0.797 Mb)

License info not available