Generalization and Data Transformation Invariance of Visual Attention Models

Bachelor Thesis (2022)
Author(s)

P.G.M. de Kruijff (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Wendelin Böhmer – Mentor (TU Delft - Algorithmics)

C.B. Poulsen – Mentor (TU Delft - Programming Languages)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2022 Pepijn de Kruijff
More Info
expand_more
Publication Year
2022
Language
English
Copyright
© 2022 Pepijn de Kruijff
Graduation Date
24-06-2022
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This paper compares the generalizing capability of multi-head attention (MHA) models with that of convolutional neural networks (CNNs). This is done by comparing their performance on out-ofdistribution data. The dataset that is used to train both models is created by coupling digits from the MNIST dataset with a set amount of background images from the CIFAR-10 dataset. An out of distribution sample is generated by using a background not used during training. This paper compares the accuracy of both models on such out-ofdistribution samples to indicate the generalizability of both models. Furthermore, the invariance of MHA models towards certain affine data transformations is compared to that of CNNs. The results indicate that MHAs might be slightly better at generalizing to unseen data, but that CNNs are better able to generalize to the data transformations performed in this papers experiments.

Files

License info not available