On the Regularization of Convolutional Neural Networks and Transformers under Distribution Shifts
L.Z. Assini (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Wendelin Böhmer – Mentor (TU Delft - Algorithmics)
C.B. Poulsen – Graduation committee member (TU Delft - Programming Languages)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
The use of Transformers outside the realm of natural language processing is becoming more and more prevalent. Already in the classification of data sets such as CIFAR-100 it has shown to be able to perform just as well as the much more established Convolutional Neural Network. This paper investigates the possible out-of-distribution capabilities of the multi-head attention mechanism, through the classification of the MNIST data set with added backgrounds. Additionally, various regularization techniques are applied to increase the generalization capabilities even more. Regularization is shown to be an important tool to improve out-of-distribution accuracy, though it might imply some trade offs for in-distribution settings.