State-of-the-art models are susceptible to adversarial attacks. These attacks can cause catastrophic misclassification when robustness is required. With the increasing popularity of the retrieval augmentation paradigm in deep learning, we adopt it as a fully differential framewor
...
State-of-the-art models are susceptible to adversarial attacks. These attacks can cause catastrophic misclassification when robustness is required. With the increasing popularity of the retrieval augmentation paradigm in deep learning, we adopt it as a fully differential framework for adversarial robustness. We evaluate our method on three visual classification datasets, including ImageNet and attack our model with two white box attacks and a black box attack under various L2 and L∞ norms. The results indicate that a robust classifier emerges if the model fully relies on retrieved examples. We find that we can already obtain a PGD robust ImageNet classifier with 80.1% clean and 64.7% adversarial accuracy, using only one or two examples per class from the training data in the memory set. Contrary to other adversarial defense mechanisms, our method works directly on top of pre-trained models and remains robust when other defenses start to degrade for PGD attacks increasing in strength.