🤖 AI Summary
Existing pre-trained 2D graph encoders, though powerful, neglect submolecular structural knowledge (e.g., atoms, bonds), while mainstream molecular pre-training methods suffer from task-specific design, hindering flexible integration of diverse domain knowledge. To address this, we propose MolGA—a framework that, **without modifying the pre-trained encoder**, dynamically and finely fuses topological representations with heterogeneous molecular knowledge (e.g., functional groups, pharmacophores, geometric constraints) via a **molecular structure alignment strategy** and an **instance-level conditional adaptation mechanism**. MolGA preserves encoder generality while enhancing interpretability through knowledge-guided representation refinement. Evaluated on 11 molecular property prediction and biomedical benchmark datasets, MolGA consistently outperforms state-of-the-art baselines, demonstrating significant performance gains. It establishes a novel, efficient, and knowledge-enhanced paradigm for downstream molecular adaptation.
📝 Abstract
Molecular graph representation learning is widely used in chemical and biomedical research. While pre-trained 2D graph encoders have demonstrated strong performance, they overlook the rich molecular domain knowledge associated with submolecular instances (atoms and bonds). While molecular pre-training approaches incorporate such knowledge into their pre-training objectives, they typically employ designs tailored to a specific type of knowledge, lacking the flexibility to integrate diverse knowledge present in molecules. Hence, reusing widely available and well-validated pre-trained 2D encoders, while incorporating molecular domain knowledge during downstream adaptation, offers a more practical alternative. In this work, we propose MolGA, which adapts pre-trained 2D graph encoders to downstream molecular applications by flexibly incorporating diverse molecular domain knowledge. First, we propose a molecular alignment strategy that bridge the gap between pre-trained topological representations with domain-knowledge representations. Second, we introduce a conditional adaptation mechanism that generates instance-specific tokens to enable fine-grained integration of molecular domain knowledge for downstream tasks. Finally, we conduct extensive experiments on eleven public datasets, demonstrating the effectiveness of MolGA.