Reducing Data in Visual AI: I-JEPA

Optimizing I-JEPA for Data Efficiency

Bachelor Thesis (2026)
Author(s)

M. Plotnikov (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

J.C. van Gemert – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

M.J.G. Olsthoorn – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

P.J.W. Reijalt – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

A.D. Manolache – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2026
Language
English
Graduation Date
22-06-2026
Awarding Institution
Delft University of Technology
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Faculty
Electrical Engineering, Mathematics and Computer Science
Downloads counter
8
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Self-supervised learning eliminates the need for image labels to learn meaningful visual representations, but it does not remove the need for large pretraining datasets. This work studies how Image-based Joint-Embedding Predictive Architecture (I-JEPA) behaves when pretraining data is deliberately limited. We train I-JEPA on stratified Tiny ImageNet subsets and evaluate the frozen representations with CIFAR-10 linear probing. The results show a steep improvement from the smallest subsets to the medium-data regime, followed by a plateau around the largest subsets under the standard final-checkpoint protocol. We also test two architectural modifications motivated by I-JEPA's design: reducing predictor capacity, to test whether an over-expressive predictor absorbs the pretext task instead of forcing useful encoder features, and adding shared photometric augmentation, to test whether extra input variation helps in low-data training. The shallow predictor improves transfer at 32k and 64k images but is neutral or harmful at the smallest and largest splits. The augmentation decreased downstream accuracy at 16k and was neutral at 32k. Additional controls---predictor depth sweeps, fixed-update budgets, and intermediate checkpoint analysis---suggest that the largest-split plateau is partly a training-dynamics issue rather than a pure data-efficiency ceiling. A cross-method comparison with Barlow Twins, MoCo, DINO, and MAE under the shared protocol contextualizes I-JEPA's data efficiency among SSL alternatives.

Files

Main.pdf
(pdf | 0.531 Mb)
License info not available