MP
M. Plotnikov
info
Please Note
<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>
1 records found
1
Reducing Data in Visual AI: I-JEPA
Optimizing I-JEPA for Data Efficiency
Bachelor thesis
(2026)
-
M. Plotnikov, J.C. van Gemert, M.J.G. Olsthoorn, P.J.W. Reijalt, A.D. Manolache
Self-supervised learning eliminates the need for image labels to learn meaningful visual representations, but it does not remove the need for large pretraining datasets. This work studies how Image-based Joint-Embedding Predictive Architecture (I-JEPA) behaves when pretraining data is deliberately limited. We train I-JEPA on stratified Tiny ImageNet subsets and evaluate the frozen representations with CIFAR-10 linear probing. The results show a steep improvement from the smallest subsets to the medium-data regime, followed by a plateau around the largest subsets under the standard final-checkpoint protocol. We also test two architectural modifications motivated by I-JEPA's design: reducing predictor capacity, to test whether an over-expressive predictor absorbs the pretext task instead of forcing useful encoder features, and adding shared photometric augmentation, to test whether extra input variation helps in low-data training. The shallow predictor improves transfer at 32k and 64k images but is neutral or harmful at the smallest and largest splits. The augmentation decreased downstream accuracy at 16k and was neutral at 32k. Additional controls---predictor depth sweeps, fixed-update budgets, and intermediate checkpoint analysis---suggest that the largest-split plateau is partly a training-dynamics issue rather than a pure data-efficiency ceiling. A cross-method comparison with Barlow Twins, MoCo, DINO, and MAE under the shared protocol contextualizes I-JEPA's data efficiency among SSL alternatives.
...
Self-supervised learning eliminates the need for image labels to learn meaningful visual representations, but it does not remove the need for large pretraining datasets. This work studies how Image-based Joint-Embedding Predictive Architecture (I-JEPA) behaves when pretraining data is deliberately limited. We train I-JEPA on stratified Tiny ImageNet subsets and evaluate the frozen representations with CIFAR-10 linear probing. The results show a steep improvement from the smallest subsets to the medium-data regime, followed by a plateau around the largest subsets under the standard final-checkpoint protocol. We also test two architectural modifications motivated by I-JEPA's design: reducing predictor capacity, to test whether an over-expressive predictor absorbs the pretext task instead of forcing useful encoder features, and adding shared photometric augmentation, to test whether extra input variation helps in low-data training. The shallow predictor improves transfer at 32k and 64k images but is neutral or harmful at the smallest and largest splits. The augmentation decreased downstream accuracy at 16k and was neutral at 32k. Additional controls---predictor depth sweeps, fixed-update budgets, and intermediate checkpoint analysis---suggest that the largest-split plateau is partly a training-dynamics issue rather than a pure data-efficiency ceiling. A cross-method comparison with Barlow Twins, MoCo, DINO, and MAE under the shared protocol contextualizes I-JEPA's data efficiency among SSL alternatives.