On Color and Symmetries for Data Efficient Deep Learning

More Info
expand_more

Abstract

Computer vision algorithms are getting more advanced by the day and slowly approach human-like capabilities, such as detecting objects in cluttered scenes and recognizing facial expressions. Yet, computers learn to perform these tasks very differently from humans. Where humans can generalize between different lighting conditions or geometric orientations with ease, computers require vast amounts of training data to adapt from day to night images, or even to recognize a cat hanging upside-down. This requires additional data, annotations and compute power, increasing the development costs of useful computer vision models. This thesis is therefore concerned with reducing the data and compute hunger of computer vision algorithms by incorporating prior knowledge into the model architecture. Knowledge that is built in no longer needs to be learned from data.

This thesis considers various knowledge priors. To improve the robustness of deep learning models to changes in illumination, we make use of color invariant representations derived from physics-based reflection models. We find that a color invariant input layer effectively normalizes the feature map activations throughout the entire network, thereby reducing the distribution shift that normally occurs between day and night images.

Equivariance has proven to be a useful network property for improving data efficiency. We introduce the color equivariant convolution, where spatial features are explicitly shared between different colors. This improves generalization to out-of-distribution colors, and therefore reduces the amount of required training data.

We subsequently investigate Group Equivariant Convolutions (GConvs). First, we discover that GConv filters learn redundant symmetries, which can be hard-coded using separable convolutions. This preserves equivariance to rotation and mirroring, and improves data and compute efficiency. We also explore the notion of approximate equivariance in GConvs. Subsampling is known to introduce equivariance errors in regular convolutional layers, and we find that it similarly breaks exact equivariance for rotation and mirroring. This turns out to be a double-edged sword: while it improves performance on in-distribution data, at the same time it negatively affects out-of-distribution generalization. Finally, we show that exact equivariance can be restored by choosing an appropriate input size.

This thesis aims to provide a step forward in the adoption of invariant and equivariant architectures to improve data and compute efficiency in deep learning.