Joint Embedding Predictive Architecture for Self-supervised Pretraining on Polymer Molecular Graphs

None, None

Joint Embedding Predictive Architecture for Self-supervised Pretraining on Polymer Molecular Graphs

Master Thesis (2024)

Author(s)

F. Piccoli (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

G. Vogel – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

J.M. Weber – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Marcel .J.T. Reinders – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Megha Khosla – Graduation committee member (TU Delft - Multimedia Computing)

Faculty

Electrical Engineering, Mathematics and Computer Science

Deep learning Machine learning Polymer Graphs Self-supervised learning Molecules JEPA

To reference this document use:

https://resolver.tudelft.nl/uuid:8e587e60-b3fa-46ba-bb01-2effd2b02685

More Info

expand_more

Publication Year

2024

Language

English

Graduation Date

01-07-2024

Awarding Institution

Delft University of Technology

Programme

['Computer Science']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Recent advancements in machine learning (ML) have shown promise in accelerating polymer discovery by aiding in tasks such as virtual screening via property prediction, and the design of new polymer materials with desired chemical properties. However, progress in polymer ML is hampered by the scarcity of high-quality, labelled datasets, which are necessary for training supervised ML models. In this work, we study the use of the very recent ’Joint Embedding Predictive Architecture’ (JEPA) type for self-supervised learning (SSL) on polymer molecular graphs, to understand whether pretraining with the proposed SSL strategy improves downstream performance when labelled data is scarce. By doing so, this study aims to shed light on this new family of architectures in the molecular graph domain and provide insights and directions for future research on JEPAs. Our experimental results indicate that JEPA self-supervised pretraining enhances downstream performance, particularly when labelled data is very scarce, achieving improvements across all tested datasets.

Files

Thesis_piccoli.pdf

(pdf | 19.4 Mb)

License info not available