Reducing data in visual AI

Assessing the Data Efficiency of Masked Autoencoders in Resource-Constrained Environments

Bachelor Thesis (2026)
Author(s)

D. Terziev (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

J.C. van Gemert – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

A.D. Manolache – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

P.J.W. Reijalt – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

M.J.G. Olsthoorn – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2026
Language
English
Graduation Date
26-06-2026
Awarding Institution
Delft University of Technology
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Faculty
Electrical Engineering, Mathematics and Computer Science
Downloads counter
12
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Visual foundation models based on Vision Transformers often depend on large datasets and substantial computational resources, limiting their accessibility for resource-constrained research settings. This paper investigates the data efficiency of Masked Autoencoders (MAE) by studying how pre-training dataset size and mask ratio affect downstream representation quality. An MAE model is pre-trained on nested subsets of the same dataset ranging from 1k to 100k images, using different mask ratios, and then evaluated on a different downstream task dataset. The results show that MAE learns transferable representations even from small unlabeled datasets, with downstream accuracy increasing steadily as more pre-training data is used. The experiments also show that the optimal masking difficulty depends on the data regime: lower masking improves validation accuracy for the smallest subsets, while the original 75% MAE mask ratio becomes stronger as the dataset size increases. These findings suggest that mask ratio should not be treated as a fixed default in MAE training. Instead, reducing the mask ratio can improve data efficiency when pre-training data is limited, while higher masking remains effective when more visual variation is available.

Files

Dimo_Terziev_final_paper.pdf
(pdf | 0.971 Mb)
License info not available