Joint Embedding Predictive Architecture for Self-supervised Pretraining on Polymer Molecular Graphs
F. Piccoli (TU Delft - Electrical Engineering, Mathematics and Computer Science)
G. Vogel – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
J.M. Weber – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
Marcel .J.T. Reinders – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)
Megha Khosla – Graduation committee member (TU Delft - Multimedia Computing)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Recent advancements in machine learning (ML) have shown promise in accelerating polymer discovery by aiding in tasks such as virtual screening via property prediction, and the design of new polymer materials with desired chemical properties. However, progress in polymer ML is hampered by the scarcity of high-quality, labelled datasets, which are necessary for training supervised ML models. In this work, we study the use of the very recent ’Joint Embedding Predictive Architecture’ (JEPA) type for self-supervised learning (SSL) on polymer molecular graphs, to understand whether pretraining with the proposed SSL strategy improves downstream performance when labelled data is scarce. By doing so, this study aims to shed light on this new family of architectures in the molecular graph domain and provide insights and directions for future research on JEPAs. Our experimental results indicate that JEPA self-supervised pretraining enhances downstream performance, particularly when labelled data is very scarce, achieving improvements across all tested datasets.