Hierarchical Molecular Representation Learning via Fragment-Based Self-Supervised Embedding Prediction

📅 2026-02-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing graph self-supervised learning methods for molecular representation often overlook chemically relevant substructural information, limiting their ability to effectively model key fragments that govern molecular properties. To address this, we propose GraSPNet, a novel framework that introduces fragment-level modeling into self-supervised molecular graph learning without requiring a predefined vocabulary. By leveraging unsupervised fragment decomposition, GraSPNet performs hierarchical message passing and masked semantic prediction across both atomic and fragment granularities. This enables joint atom-fragment multi-resolution self-supervised learning, significantly enhancing the chemical interpretability, representational capacity, and cross-task transferability of learned representations. Extensive experiments demonstrate that GraSPNet consistently outperforms current graph self-supervised approaches on multiple molecular property prediction benchmarks.

Technology Category

Application Category

📝 Abstract
Graph self-supervised learning (GSSL) has demonstrated strong potential for generating expressive graph embeddings without the need for human annotations, making it particularly valuable in domains with high labeling costs such as molecular graph analysis. However, existing GSSL methods mostly focus on node- or edge-level information, often ignoring chemically relevant substructures which strongly influence molecular properties. In this work, we propose Graph Semantic Predictive Network (GraSPNet), a hierarchical self-supervised framework that explicitly models both atomic-level and fragment-level semantics. GraSPNet decomposes molecular graphs into chemically meaningful fragments without predefined vocabularies and learns node- and fragment-level representations through multi-level message passing with masked semantic prediction at both levels. This hierarchical semantic supervision enables GraSPNet to learn multi-resolution structural information that is both expressive and transferable. Extensive experiments on multiple molecular property prediction benchmarks demonstrate that GraSPNet learns chemically meaningful representations and consistently outperforms state-of-the-art GSSL methods in transfer learning settings.
Problem

Research questions and friction points this paper is trying to address.

molecular representation learning
graph self-supervised learning
fragment-level semantics
hierarchical representation
molecular substructures
Innovation

Methods, ideas, or system contributions that make the work stand out.

hierarchical representation learning
fragment-based decomposition
self-supervised graph learning
masked semantic prediction
molecular graph embedding
Jiele Wu
Jiele Wu
National University of Singapore
Machine LearningGraph Neural NetworkNeuroscienceLearning Theory
H
Haozhe Ma
School of Computing, National University of Singapore, Singapore
Zhihan Guo
Zhihan Guo
University of Wisconsin-Madison
database systems
T
Thanh Vinh Vo
School of Computing, National University of Singapore, Singapore
T
Tze Yun Leong
School of Computing, National University of Singapore, Singapore