Targeted Attack on GPT-Neo for the SATML Language Model Data Extraction Challenge [PRESENTATION]

None, None; None, None; None, None

Targeted Attack on GPT-Neo for the SATML Language Model Data Extraction Challenge [PRESENTATION]

Other (2023)

Author(s)

Ali Al-Kaswan (TU Delft - Software Engineering)

Maliheh Izadi (TU Delft - Software Engineering)

A. Van van Deursen (TU Delft - Software Technology)

Research Group

Software Engineering

Copyright

To reference this document use:

https://resolver.tudelft.nl/uuid:c3681afd-1202-4963-bbf3-7b065abe771c

More Info

expand_more

Publication Year

2023

Language

English

Copyright

Research Group

Software Engineering

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Previous work has shown that Large Language Models are susceptible to so-called data extraction attacks. This allows an attacker to extract a sample that was contained in the training data, which has massive privacy implications. The construction of data extraction attacks is challenging, current attacks are quite inefficient, and there exists a significant gap in the extraction capabilities of untargeted attacks and memorization. Thus, targeted attacks are proposed, which identify if a given sample from the training data, is extractable from a model. In this work, we apply a targeted data extraction attack to the SATML2023 Language Model Training Data Extraction Challenge. We apply a two-step approach. In the first step, we maximise the recall of the model and are able to extract the suffix for 69% of the samples. In the second step, we use a classifier-based Membership Inference Attack on the generations. Our AutoSklearn classifier achieves a precision of 0.841. The full approach reaches a score of 0.405 recall at a 10% false positive rate, which is an improvement of 34% over the baseline of 0.301.

Files

SaTML_Training_Data_Extraction... (pdf)

(pdf | 0.267 Mb)

License info not available