A. Lengyel | TU Delft Repository

On Color and Symmetries for Data Efficient Deep Learning

Doctoral thesis (2024) - A. Lengyel

Computer vision algorithms are getting more advanced by the day and slowly approach human-like capabilities, such as detecting objects in cluttered scenes and recognizing facial expressions. Yet, computers learn to perform these tasks very differently from humans. Where humans can generalize between different lighting conditions or geometric orientations with ease, computers require vast amounts of training data to adapt from day to night images, or even to recognize a cat hanging upside-down. This requires additional data, annotations and compute power, increasing the development costs of useful computer vision models. This thesis is therefore concerned with reducing the data and compute hunger of computer vision algorithms by incorporating prior knowledge into the model architecture. Knowledge that is built in no longer needs to be learned from data. This thesis considers various knowledge priors. To improve the robustness of deep learning models to changes in illumination, we make use of color invariant representations derived from physics-based reflection models. We find that a color invariant input layer effectively normalizes the feature map activations throughout the entire network, thereby reducing the distribution shift that normally occurs between day and night images. Equivariance has proven to be a useful network property for improving data efficiency. We introduce the color equivariant convolution, where spatial features are explicitly shared between different colors. This improves generalization to out-of-distribution colors, and therefore reduces the amount of required training data. We subsequently investigate Group Equivariant Convolutions (GConvs). First, we discover that GConv filters learn redundant symmetries, which can be hard-coded using separable convolutions. This preserves equivariance to rotation and mirroring, and improves data and compute efficiency. We also explore the notion of approximate equivariance in GConvs. Subsampling is known to introduce equivariance errors in regular convolutional layers, and we find that it similarly breaks exact equivariance for rotation and mirroring. This turns out to be a double-edged sword: while it improves performance on in-distribution data, at the same time it negatively affects out-of-distribution generalization. Finally, we show that exact equivariance can be restored by choosing an appropriate input size. This thesis aims to provide a step forward in the adoption of invariant and equivariant architectures to improve data and compute efficiency in deep learning. ...

Computer vision algorithms are getting more advanced by the day and slowly approach human-like capabilities, such as detecting objects in cluttered scenes and recognizing facial expressions. Yet, computers learn to perform these tasks very differently from humans. Where humans can generalize between different lighting conditions or geometric orientations with ease, computers require vast amounts of training data to adapt from day to night images, or even to recognize a cat hanging upside-down. This requires additional data, annotations and compute power, increasing the development costs of useful computer vision models. This thesis is therefore concerned with reducing the data and compute hunger of computer vision algorithms by incorporating prior knowledge into the model architecture. Knowledge that is built in no longer needs to be learned from data. This thesis considers various knowledge priors. To improve the robustness of deep learning models to changes in illumination, we make use of color invariant representations derived from physics-based reflection models. We find that a color invariant input layer effectively normalizes the feature map activations throughout the entire network, thereby reducing the distribution shift that normally occurs between day and night images. Equivariance has proven to be a useful network property for improving data efficiency. We introduce the color equivariant convolution, where spatial features are explicitly shared between different colors. This improves generalization to out-of-distribution colors, and therefore reduces the amount of required training data. We subsequently investigate Group Equivariant Convolutions (GConvs). First, we discover that GConv filters learn redundant symmetries, which can be hard-coded using separable convolutions. This preserves equivariance to rotation and mirroring, and improves data and compute efficiency. We also explore the notion of approximate equivariance in GConvs. Subsampling is known to introduce equivariance errors in regular convolutional layers, and we find that it similarly breaks exact equivariance for rotation and mirroring. This turns out to be a double-edged sword: while it improves performance on in-distribution data, at the same time it negatively affects out-of-distribution generalization. Finally, we show that exact equivariance can be restored by choosing an appropriate input size. This thesis aims to provide a step forward in the adoption of invariant and equivariant architectures to improve data and compute efficiency in deep learning.

Using and Abusing Equivariance

Conference paper (2023) - T.F. Edixhoven, A. Lengyel, J.C. van Gemert

In this paper we show how Group Equivariant Convolutional Neural Networks use subsampling to learn to break equivariance to the rotation and reflection symmetries. We focus on the 2D rotations and reflections and investigate the impact of the broken equivariance on network performance. We show that a change in the input dimension of a network as small as a single pixel can be enough for commonly used architectures to become approximately equivariant, rather than exactly. We investigate the impact of networks not being exactly equivariant and find that approximately equivariant networks generalise significantly worse to unseen symmetries compared to their exactly equivariant counterparts. However, when the symmetries in the training data are not identical to the symmetries of the network, we find that approximately equivariant networks can relax their equivariance constraints, matching or outperforming exactly equivariant networks on common benchmarks. ...

Color Equivariant Convolutional Networks

Conference paper (2023) - A. Lengyel, O. Strafforello, R. Bruintjes, A.S. Gielisse, J.C. van Gemert

Color is a crucial visual cue readily exploited by Convolutional Neural Networks (CNNs) for object recognition. However, CNNs struggle if there is data imbalance between color variations introduced by accidental recording conditions. Color invariance addresses this issue but does so at the cost of removing all color information, which sacrifices discriminative power. In this paper, we propose Color Equivariant Convolutions (CEConvs), a novel deep learning building block that enables shape feature sharing across the color spectrum while retaining important color information. We extend the notion of equivariance from geometric to photometric transformations by incorporating parameter sharing over hue-shifts in a neural network. We demonstrate the benefits of CEConvs in terms of downstream performance to various tasks and improved robustness to color changes, including train-test distribution shifts. Our approach can be seamlessly integrated into existing architectures, such as ResNets, and offers a promising solution for addressing color-based domain shifts in CNNs. ...

Benchmarking Data Efficiency and Computational Efficiency of Temporal Action Localization Models

Conference paper (2023) - J. Warchocki, T. Oprescu, Y. Wang, A. Dămăcuș, P.M. Misterka, R. Bruintjes, A. Lengyel, O. Strafforello, J.C. van Gemert

In temporal action localization, given an input video, the goal is to predict which actions it contains, where they begin, and where they end. Training and testing current state-of- the-art deep learning models requires access to large amounts of data and computational power. However, gathering such data is challenging and computational resources might be limited. This work explores and measures how current deep temporal action localization models perform in settings constrained by the amount of data or computational power. We measure data efficiency by training each model on a subset of the training set. We find that TemporalMaxer outperforms other models in data-limited settings. Furthermore, we recommend TriDet when training time is limited. To test the efficiency of the models during inference, we pass videos of different lengths through each model. We find that TemporalMaxer requires the least computational resources, likely due to its simple architecture. ...

Copy-Pasting Coherent Depth Regions Improves Contrastive Learning for Urban-Scene Segmentation

Conference paper (2022) - L. Zeng, A. Lengyel, N. Tömen, J.C. van Gemert

In this work, we leverage estimated depth to boost self-supervised contrastive learning for segmentation of urban scenes, where unlabeled videos are readily available for training self-supervised depth estimation. We argue that the semantics of a coherent group of pixels in 3D space is self-contained and invariant to the contexts in which they appear. We group coherent, semantically related pixels into coherent depth regions given their estimated depth and use copy-paste to synthetically vary their contexts. In this way, cross-context correspondences are built in contrastive learning and a context-invariant representation is learned. For unsupervised semantic segmentation of urban scenes, our method surpasses the previous state-of-the-art baseline by +7.14% in mIoU on Cityscapes and +6.65% on KITTI. For fine-tuning on Cityscapes and KITTI segmentation, our method is competitive with existing models, yet, we do not need to pre-train on ImageNet or COCO, while we are also more computationally efficient. Our code is available on https://github.com/LeungTsang/CPCDR. ...

Exploiting Learned Symmetries in Group Equivariant Convolutions

Conference paper (2021) - Attila Lengyel, Jan van Gemert

Group Equivariant Convolutions (GConvs) enable convolutional neural networks to be equivariant to various transformation groups, but at an additional parameter and compute cost. We investigate the filter parameters learned by GConvs and find certain conditions under which they become highly redundant. We show that GConvs can be efficiently decomposed into depthwise separable convolutions while preserving equivariance properties and demonstrate improved performance and data efficiency on two datasets. All code is publicly available at github.com/Attila94/SepGrouPy. ...

Zero-Shot Day-Night Domain Adaptation with a Physics Prior

Conference paper (2021) - A. Lengyel, Sourav Garg, Michael Milford, J.C. van Gemert

We explore the zero-shot setting for day-night domain adaptation. The traditional domain adaptation setting is to train on one domain and adapt to the target domain by exploiting unlabeled data samples from the test set. As gathering relevant test data is expensive and sometimes even impossible, we remove any reliance on test data imagery and instead exploit a visual inductive prior derived from physics-based reflection models for domain adaptation. We cast a number of color invariant edge detectors as trainable layers in a convolutional neural network and evaluate their robustness to illumination changes. We show that the color invariant layer reduces the day-night distribution shift in feature map activations throughout the network. We demonstrate improved performance for zero-shot day to night domain adaptation on both synthetic as well as natural datasets in various tasks, including classification, segmentation and place recognition. ...