Reducing data in visual AI

None, None

Reducing data in visual AI

Assessing the Data Efficiency of Masked Autoencoders in Resource-Constrained Environments

Bachelor Thesis (2026)

Author(s)

D. Terziev (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

J.C. van Gemert – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

A.D. Manolache – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

P.J.W. Reijalt – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

M.J.G. Olsthoorn – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

Data Efficiency Self-supervised learning Vision Transformer Masked Autoencoders

To reference this document use

https://resolver.tudelft.nl/uuid:409dbc8f-3348-45d2-ba7c-3f334762df7d

More Info

expand_more

Publication Year

2026

Language

English

Graduation Date

26-06-2026

Awarding Institution

Delft University of Technology

Project

CSE3000 Research Project

Programme

Computer Science and Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

36

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Visual foundation models based on Vision Transformers often depend on large datasets and substantial computational resources, limiting their accessibility for resource-constrained research settings. This paper investigates the data efficiency of Masked Autoencoders (MAE) by studying how pre-training dataset size and mask ratio affect downstream representation quality. An MAE model is pre-trained on nested subsets of the same dataset ranging from 1k to 100k images, using different mask ratios, and then evaluated on a different downstream task dataset. The results show that MAE learns transferable representations even from small unlabeled datasets, with downstream accuracy increasing steadily as more pre-training data is used. The experiments also show that the optimal masking difficulty depends on the data regime: lower masking improves validation accuracy for the smallest subsets, while the original 75% MAE mask ratio becomes stronger as the dataset size increases. These findings suggest that mask ratio should not be treated as a fixed default in MAE training. Instead, reducing the mask ratio can improve data efficiency when pre-training data is limited, while higher masking remains effective when more visual variation is available.

Files

Dimo_Terziev_final_paper.pdf

(pdf | 0.971 Mb)

License info not available