Manifold-Aware Regularization for Masked Autoencoders
A.E. Dondera (TU Delft - Electrical Engineering, Mathematics and Computer Science)
H Jamali Rad – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
J.C. van Gemert – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
M.A. Migut – Graduation committee member (TU Delft - Web Information Systems)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Masked Autoencoders (MAEs) represent a significant shift in self-supervised learning (SSL) due to their independence from augmentation techniques for generating positive (and/or negative) pairs as in contrastive frameworks. Their masking and reconstruction strategy also aligns well with SSL approaches in natural language processing. Most MAEs are built upon Transformer-based architectures where visual features are not regularized as opposed to their convolutional neural network (CNN) based counterparts, which can potentially limit their effectiveness. To address this, we introduce a novel batch-wide layer-wise regularization loss applied to representations of different Transformer layers. We demonstrate that by plugging in the proposed regularization loss, one can significantly improve the performance of MAE-based baselines.