FragmentNet: Adaptive Graph Fragmentation for Graph-to-Sequence Molecular Representation Learning

📅 2025-02-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address insufficient chemical interpretability and suboptimal performance on large-scale molecular property prediction tasks, this paper proposes a graph-to-sequence adaptive fragmentation modeling framework. Methodologically, it integrates a VQVAE-GCN backbone, spatial positional encoding, global molecular descriptors, and a Transformer architecture. Key contributions include: (1) the first learnable, adaptive graph fragmentation tokenizer that guarantees chemical validity and topological connectivity of fragments; (2) a fragment-level masked modeling pretraining strategy; and (3) support for controllable fragment editing and visualizable property trend analysis. Evaluated on the MoleculeNet benchmark, our model achieves state-of-the-art performance among models of comparable size—outperforming same-scale SOTAs with fewer parameters—while matching or exceeding larger models in accuracy. Crucially, it significantly enhances both predictive performance and chemical interpretability.

Technology Category

Application Category

📝 Abstract
Molecular property prediction uses molecular structure to infer chemical properties. Chemically interpretable representations that capture meaningful intramolecular interactions enhance the usability and effectiveness of these predictions. However, existing methods often rely on atom-based or rule-based fragment tokenization, which can be chemically suboptimal and lack scalability. We introduce FragmentNet, a graph-to-sequence foundation model with an adaptive, learned tokenizer that decomposes molecular graphs into chemically valid fragments while preserving structural connectivity. FragmentNet integrates VQVAE-GCN for hierarchical fragment embeddings, spatial positional encodings for graph serialization, global molecular descriptors, and a transformer. Pre-trained with Masked Fragment Modeling and fine-tuned on MoleculeNet tasks, FragmentNet outperforms models with similarly scaled architectures and datasets while rivaling larger state-of-the-art models requiring significantly more resources. This novel framework enables adaptive decomposition, serialization, and reconstruction of molecular graphs, facilitating fragment-based editing and visualization of property trends in learned embeddings - a powerful tool for molecular design and optimization.
Problem

Research questions and friction points this paper is trying to address.

Molecular Graph Representation
Property Prediction
Performance Enhancement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adjustable Graph Slicing
Masked Fragment Modeling
VQVAE-GCN
🔎 Similar Papers
No similar papers found.
Ankur Samanta
Ankur Samanta
PhD Student, Columbia University
AI ReasoningMulti-modal foundation modelsRLHF
R
Rohan Gupta
Department of Electrical and Computer Engineering, University of Toronto, Toronto, Canada
A
Aditi Misra
Department of Electrical and Computer Engineering, University of Toronto, Toronto, Canada
C
Christian McIntosh Clarke
Department of Electrical and Computer Engineering, University of Toronto, Toronto, Canada
J
Jayakumar Rajadas
Advanced Drug Delivery and Regenerative Biomaterials Laboratory, Stanford Cardiovascular Institute, Palo Alto, USA