Bridging the Gap Between Molecule and Textual Descriptions via Substructure-aware Alignment

📅 2025-10-30

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing molecular-text representation learning methods struggle to model fine-grained alignment between molecular substructures and chemical phrases, limiting cross-modal semantic matching accuracy. To address this, we propose MolBridge—a novel framework that enables precise substructure-to-phrase alignment for the first time. MolBridge jointly optimizes graph neural networks and language models via substructure-aware contrastive learning. It further introduces phrase-level alignment enhancement and a self-refinement mechanism to dynamically filter noisy alignment signals. Evaluated on multiple molecular-text retrieval and cross-modal understanding benchmarks, MolBridge consistently outperforms state-of-the-art methods by significant margins. Ablation studies confirm that fine-grained substructure–phrase alignment is critical for enhancing multimodal representation capability. Our work establishes a new paradigm for interpretable and semantically grounded molecular-text modeling.

Technology Category

Application Category

📝 Abstract

Molecule and text representation learning has gained increasing interest due to its potential for enhancing the understanding of chemical information. However, existing models often struggle to capture subtle differences between molecules and their descriptions, as they lack the ability to learn fine-grained alignments between molecular substructures and chemical phrases. To address this limitation, we introduce MolBridge, a novel molecule-text learning framework based on substructure-aware alignments. Specifically, we augment the original molecule-description pairs with additional alignment signals derived from molecular substructures and chemical phrases. To effectively learn from these enriched alignments, MolBridge employs substructure-aware contrastive learning, coupled with a self-refinement mechanism that filters out noisy alignment signals. Experimental results show that MolBridge effectively captures fine-grained correspondences and outperforms state-of-the-art baselines on a wide range of molecular benchmarks, highlighting the significance of substructure-aware alignment in molecule-text learning.

Problem

Research questions and friction points this paper is trying to address.

Aligns molecular substructures with chemical phrases

Captures fine-grained molecule-text correspondences

Improves molecular representation learning via substructure alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Substructure-aware alignment for molecule-text learning

Substructure-aware contrastive learning with self-refinement

Augmenting molecule-description pairs with alignment signals

🔎 Similar Papers

No similar papers found.

Authors to Follow