Joint embedding predictive architecture for self-supervised pretraining on polymer molecular graphs

None, None; None, None; None, None

Joint embedding predictive architecture for self-supervised pretraining on polymer molecular graphs

Journal Article (2026)

Author(s)

Francesco Piccoli (Student TU Delft)

Gabriel Vogel (TU Delft - Pattern Recognition and Bioinformatics)

Jana M. Weber (TU Delft - Pattern Recognition and Bioinformatics)

Research Group

Pattern Recognition and Bioinformatics

DOI related publication

https://doi.org/10.1039/d5dd00308c

To reference this document use:

https://resolver.tudelft.nl/uuid:cd87879b-feb5-4726-a76f-af71c10df0cc

More Info

expand_more

Publication Year

2026

Language

English

Research Group

Pattern Recognition and Bioinformatics

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Recent advances in machine learning (ML) have shown promise in accelerating the discovery of polymers with desired properties by aiding in tasks such as virtual screening via property prediction. However, progress in polymer ML is hampered by the scarcity of high-quality labeled datasets, which are necessary for training supervised ML models. In this work, we study the use of the very recent ‘Joint Embedding Predictive Architecture’ (JEPA), a type of architecture developed for self-supervised learning (SSL), on polymer molecular graphs to understand whether pretraining with the proposed SSL strategy improves downstream performance when labeled data is scarce. We first pretrain our polymer-JEPA model on a large dataset of conjugated copolymer photocatalysts. The pretrained model is then fine-tuned on two distinct downstream tasks: predicting electron affinity in the same chemical space and classifying phase behavior in diblock copolymers, a different chemical space. Our results indicate that JEPA-based self-supervised pretraining enhances downstream performance, particularly when labeled data is very scarce, achieving improvements across both tested datasets. The method provides performance gains in cross-domain fine-tuning, highlighting its potential to extract general knowledge across different classes of polymers. By leveraging large amounts of unlabeled polymer structures for pretraining, the proposed strategy can further reduce the dependence on extensive labeled datasets.

Files

D5dd00308c.pdf

(pdf | 2.4 Mb)