Reducing Data in Visual AI: I-JEPA

None, None

Reducing Data in Visual AI: I-JEPA

Optimizing I-JEPA for Data Efficiency

Bachelor Thesis (2026)

Author(s)

M. Plotnikov (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

J.C. van Gemert – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

M.J.G. Olsthoorn – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

P.J.W. Reijalt – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

A.D. Manolache – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

Computer Vision Data Efficiency JEPA

To reference this document use

https://resolver.tudelft.nl/uuid:34c9347e-f34d-4def-a93b-1ce9cb0ee669

More Info

expand_more

Publication Year

2026

Language

English

Graduation Date

22-06-2026

Awarding Institution

Delft University of Technology

Project

CSE3000 Research Project

Programme

Computer Science and Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

8

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Self-supervised learning eliminates the need for image labels to learn meaningful visual representations, but it does not remove the need for large pretraining datasets. This work studies how Image-based Joint-Embedding Predictive Architecture (I-JEPA) behaves when pretraining data is deliberately limited. We train I-JEPA on stratified Tiny ImageNet subsets and evaluate the frozen representations with CIFAR-10 linear probing. The results show a steep improvement from the smallest subsets to the medium-data regime, followed by a plateau around the largest subsets under the standard final-checkpoint protocol. We also test two architectural modifications motivated by I-JEPA's design: reducing predictor capacity, to test whether an over-expressive predictor absorbs the pretext task instead of forcing useful encoder features, and adding shared photometric augmentation, to test whether extra input variation helps in low-data training. The shallow predictor improves transfer at 32k and 64k images but is neutral or harmful at the smallest and largest splits. The augmentation decreased downstream accuracy at 16k and was neutral at 32k. Additional controls---predictor depth sweeps, fixed-update budgets, and intermediate checkpoint analysis---suggest that the largest-split plateau is partly a training-dynamics issue rather than a pure data-efficiency ceiling. A cross-method comparison with Barlow Twins, MoCo, DINO, and MAE under the shared protocol contextualizes I-JEPA's data efficiency among SSL alternatives.

Files

Main.pdf

(pdf | 0.531 Mb)

License info not available