Learning with Spatial Biases in Vision Models

None, None

doi:10.4233/uuid:2e2d7ac2-509b-4979-b349-165b16eb03e9

Learning with Spatial Biases in Vision Models

Doctoral Thesis (2026)

Author(s)

R. Bruintjes (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

M.J.T. Reinders – Promotor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

J.C. van Gemert – Copromotor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Research Group

Pattern Recognition and Bioinformatics

Computer vision Data efficiency Scale Equivariance Spatial biases Visual priors Kernel size

DOI related publication

https://doi.org/10.4233/uuid:2e2d7ac2-509b-4979-b349-165b16eb03e9 Final published version

To reference this document use

https://doi.org/10.4233/uuid:2e2d7ac2-509b-4979-b349-165b16eb03e9

More Info

expand_more

Publication Year

2026

Language

English

Defense Date

24-06-2026

Awarding Institution

Delft University of Technology

Abstract

The field of computer vision research is very large and still growing. Many of these papers concern some type of inductive bias, by proposing new building blocks or alternative training methods for vision models. This type of research has enabled great progress in applications of vision models.

Computer vision concerns itself with the research and development of deep learning models that work on visual data. These vision models are already heavily integrated into society, powering real-world applications such as automated radiology in hospitals, self-driving cars, and autonomous drones. However, it takes a lot of data, in the form of datasets containing thousands or millions of images, to learn reliable vision models. This thesis explores the role that spatial biases (prior knowledge on the position and pose of objects in the image) can play in learning better and more data-efficient vision models.

We find that the practice of integrating prior knowledge on spatial biases (inductive spatial biases) can help to learn biases that are otherwise hard or impossible to learn. Though inductive bias can be difficult and time-consuming to design, and often increases inference cost, integrating inductive bias can result in better performance and greater data efficiency. This work showcases these patterns in spatial biases, specifically position bias and scale bias.

We find that position bias may be learned to some degree by models without the proper inductive bias, but that inductive bias helps to model these biases and improves performance. We show that whether learning position bias is helpful depends on the data. We contribute measures for position bias in vision models in general, as well as in Vision Transformers specifically, to enable the discovery of these findings. We propose an inductive bias on the position embedding of ViTs to better (un)learn position bias.

For scale bias, we find that existing scale-equivariant models for scale bias need to be tuned to the scale distribution of the data. We propose an inductive bias that allows scale-equivariant models to learn the scale bias of the dataset, thereby fitting the data better. We also propose an alternative parameterization of convolutions called MAGNet that can be adapted to known scale distributions present in the data. Models using MAGNets (FlexNets) can be much shallower and do not require pooling.

There are those who advocate against spending much time on inductive biases. The “bitter lesson” of Richard Sutton prescribes that we should simply add more data, not more inductive bias. However, data will run out at some point, perhaps sooner rather than later. Besides raw performance of vision models, given as much data as possible, should not be our only goal: data-deficient settings are real, plentiful, and important. Data-efficient vision models are the future of our field, and the search for appropriate inductive biases will remain an important endeavor.

Files

BruintjesThesisFinalPrint.pdf

(pdf | 38 Mb)

- Embargo expired in 25-06-2026

License info not available