A.D. Manolache
Please Note
6 records found
1
The Homunculus in Deep Learning
On Learning RNN Gates with RNNs
Reducing data in visual AI
Assessing the Data Efficiency of Masked Autoencoders in Resource-Constrained Environments
Reducing Data for Vision Foundation Models
Data-Efficiency of Self-Supervised Learning with DINO Multi-Crop
We pretrain a small Vision Transformer (ViT-Tiny/8) using DINO on Tiny-ImageNet subsets from 1K to 100K images at 64x64 resolution, evaluated on downstream classification tasks. Downstream accuracy grows steadily with pretraining-set size and approaches the accuracy of a fully supervised baseline at the largest scale.
Our main contribution is a multi-crop ablation across data scale, training duration, and downstream task category. We find that multi-crop's benefit at sub-ImageNet scale is delayed rather than absent, and that the optimal multi-crop count depends on the downstream task category — no single setting wins across all tasks.
These findings show that the canonical DINO recipe does not transfer cleanly to sub-ImageNet scale. We recommend choosing the multi-crop count based on training budget and downstream task type, rather than copying the ImageNet default. ...
We pretrain a small Vision Transformer (ViT-Tiny/8) using DINO on Tiny-ImageNet subsets from 1K to 100K images at 64x64 resolution, evaluated on downstream classification tasks. Downstream accuracy grows steadily with pretraining-set size and approaches the accuracy of a fully supervised baseline at the largest scale.
Our main contribution is a multi-crop ablation across data scale, training duration, and downstream task category. We find that multi-crop's benefit at sub-ImageNet scale is delayed rather than absent, and that the optimal multi-crop count depends on the downstream task category — no single setting wins across all tasks.
These findings show that the canonical DINO recipe does not transfer cleanly to sub-ImageNet scale. We recommend choosing the multi-crop count based on training budget and downstream task type, rather than copying the ImageNet default.
Reducing Data for Vision Foundation Models
Data-Efficiency of Self-Supervised Learning with Momentum Contrast
How Does the Downstream Accuracy of Barlow Twins Scale with Pre-training Set Size?
A small-compute characterization with a ViT-Tiny on Tiny-ImageNet subsets
Reducing Data in Visual AI: I-JEPA
Optimizing I-JEPA for Data Efficiency