🤖 AI Summary
To address the scarcity of high-quality labeled data limiting downstream performance in polymer machine learning, this work introduces the Joint Embedding Predictive Architecture (JEPA) for self-supervised pretraining of polymer molecular graphs—the first such application to polymers. Our method leverages graph neural networks to learn label-free representations by contrasting local and global structural embeddings, effectively capturing both topological and chemical characteristics of polymers. Compared with conventional self-supervised approaches (e.g., GraphMAE, InfoGraph), it achieves substantial improvements in predicting key properties—including glass transition temperature and thermal conductivity—under extremely low-label regimes (1%–5% labeled data), reducing mean absolute error by 12.7%–23.4% across multiple polymer benchmark datasets. Key contributions are: (1) proposing the first JEPA-adapted pretraining paradigm for polymer graphs; and (2) demonstrating its strong generalization capability in few-shot settings, offering a novel pathway for intelligent materials design.
📝 Abstract
Recent advances in machine learning (ML) have shown promise in accelerating the discovery of polymers with desired properties by aiding in tasks such as virtual screening via property prediction. However, progress in polymer ML is hampered by the scarcity of high-quality labeled datasets, which are necessary for training supervised ML models. In this work, we study the use of the very recent 'Joint Embedding Predictive Architecture' (JEPA), a type of architecture for self-supervised learning (SSL), on polymer molecular graphs to understand whether pretraining with the proposed SSL strategy improves downstream performance when labeled data is scarce. Our results indicate that JEPA-based self-supervised pretraining on polymer graphs enhances downstream performance, particularly when labeled data is very scarce, achieving improvements across all tested datasets.