Reducing Data in Visual AI: I-JEPA
Optimizing I-JEPA for Data Efficiency
M. Plotnikov (TU Delft - Electrical Engineering, Mathematics and Computer Science)
J.C. van Gemert – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)
M.J.G. Olsthoorn – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)
P.J.W. Reijalt – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)
A.D. Manolache – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Self-supervised learning eliminates the need for image labels to learn meaningful visual representations, but it does not remove the need for large pretraining datasets. This work studies how Image-based Joint-Embedding Predictive Architecture (I-JEPA) behaves when pretraining data is deliberately limited. We train I-JEPA on stratified Tiny ImageNet subsets and evaluate the frozen representations with CIFAR-10 linear probing. The results show a steep improvement from the smallest subsets to the medium-data regime, followed by a plateau around the largest subsets under the standard final-checkpoint protocol. We also test two architectural modifications motivated by I-JEPA's design: reducing predictor capacity, to test whether an over-expressive predictor absorbs the pretext task instead of forcing useful encoder features, and adding shared photometric augmentation, to test whether extra input variation helps in low-data training. The shallow predictor improves transfer at 32k and 64k images but is neutral or harmful at the smallest and largest splits. The augmentation decreased downstream accuracy at 16k and was neutral at 32k. Additional controls---predictor depth sweeps, fixed-update budgets, and intermediate checkpoint analysis---suggest that the largest-split plateau is partly a training-dynamics issue rather than a pure data-efficiency ceiling. A cross-method comparison with Barlow Twins, MoCo, DINO, and MAE under the shared protocol contextualizes I-JEPA's data efficiency among SSL alternatives.