Learning with Spatial Biases in Vision Models

Doctoral Thesis (2026)
Author(s)

R. Bruintjes (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

M.J.T. Reinders – Promotor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

J.C. van Gemert – Copromotor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Research Group
Pattern Recognition and Bioinformatics
DOI related publication
https://doi.org/10.4233/uuid:2e2d7ac2-509b-4979-b349-165b16eb03e9 Final published version
More Info
expand_more
Publication Year
2026
Language
English
Defense Date
24-06-2026
Awarding Institution
Delft University of Technology
Related content
Research Group
Pattern Recognition and Bioinformatics
ISBN (electronic)
978-94-6518-343-5
Downloads counter
11
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The field of computer vision research is very large and still growing. Many of these papers concern some type of inductive bias, by proposing new building blocks or alternative training methods for vision models. This type of research has enabled great progress in applications of vision models.

Computer vision concerns itself with the research and development of deep learning models that work on visual data. These vision models are already heavily integrated into society, powering real-world applications such as automated radiology in hospitals, self-driving cars, and autonomous drones. However, it takes a lot of data, in the form of datasets containing thousands or millions of images, to learn reliable vision models. This thesis explores the role that spatial biases (prior knowledge on the position and pose of objects in the image) can play in learning better and more data-efficient vision models.

We find that the practice of integrating prior knowledge on spatial biases (inductive spatial biases) can help to learn biases that are otherwise hard or impossible to learn. Though inductive bias can be difficult and time-consuming to design, and often increases inference cost, integrating inductive bias can result in better performance and greater data efficiency. This work showcases these patterns in spatial biases, specifically position bias and scale bias.

We find that position bias may be learned to some degree by models without the proper inductive bias, but that inductive bias helps to model these biases and improves performance. We show that whether learning position bias is helpful depends on the data. We contribute measures for position bias in vision models in general, as well as in Vision Transformers specifically, to enable the discovery of these findings. We propose an inductive bias on the position embedding of ViTs to better (un)learn position bias.

For scale bias, we find that existing scale-equivariant models for scale bias need to be tuned to the scale distribution of the data. We propose an inductive bias that allows scale-equivariant models to learn the scale bias of the dataset, thereby fitting the data better. We also propose an alternative parameterization of convolutions called MAGNet that can be adapted to known scale distributions present in the data. Models using MAGNets (FlexNets) can be much shallower and do not require pooling.

There are those who advocate against spending much time on inductive biases. The “bitter lesson” of Richard Sutton prescribes that we should simply add more data, not more inductive bias. However, data will run out at some point, perhaps sooner rather than later. Besides raw performance of vision models, given as much data as possible, should not be our only goal: data-deficient settings are real, plentiful, and important. Data-efficient vision models are the future of our field, and the search for appropriate inductive biases will remain an important endeavor.

Files

License info not available
warning

File under embargo until 25-06-2026