π€ AI Summary
In molecular relation learning (MRL), attention-based substructure alignment lacks chemical prior guidance, leading to unstable generalization under functional-group or scaffold distribution shifts. To address this, we propose a chemistry-knowledge-driven dynamic representation alignment mechanism. First, we formalize the βinduced fitβ principle as a differentiable bias-correction function and integrate it with a subgraph information bottleneck to enable adaptive selection of functionally compatible substructure pairs. Second, we design a correction module based on substructure edge reconstruction and a chemistry-guided attention mechanism. Evaluated on nine benchmark datasets, our method significantly outperforms state-of-the-art approaches: it achieves average stability gains of 32.7% under both rule-based and scaffold-shift scenarios, and attains top performance on both core task metrics.
π Abstract
Molecular Relational Learning (MRL) is widely applied in natural sciences to predict relationships between molecular pairs by extracting structural features. The representational similarity between substructure pairs determines the functional compatibility of molecular binding sites. Nevertheless, aligning substructure representations by attention mechanisms lacks guidance from chemical knowledge, resulting in unstable model performance in chemical space ( extit{e.g.}, functional group, scaffold) shifted data. With theoretical justification, we propose the extbf{Re}presentational extbf{Align}ment with Chemical Induced extbf{Fit} (ReAlignFit) to enhance the stability of MRL. ReAlignFit dynamically aligns substructure representation in MRL by introducing chemical Induced Fit-based inductive bias. In the induction process, we design the Bias Correction Function based on substructure edge reconstruction to align representations between substructure pairs by simulating chemical conformational changes (dynamic combination of substructures). ReAlignFit further integrates the Subgraph Information Bottleneck during fit process to refine and optimize substructure pairs exhibiting high chemical functional compatibility, leveraging them to generate molecular embeddings. Experimental results on nine datasets demonstrate that ReAlignFit outperforms state-of-the-art models in two tasks and significantly enhances model's stability in both rule-shifted and scaffold-shifted data distributions.