Multi-Modal Molecular Representation Learning via Structure Awareness

📅 2025-05-09

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing methods for multimodal molecular representation learning neglect inter-modal interactions and higher-order invariant relationships across molecules, limiting discriminability and generalization of learned representations. To address this, we propose MMSA, a structure-aware multimodal self-supervised pretraining framework. MMSA is the first to unify heterogeneous modalities—including molecular images and 2D/3D topological graphs—into a structure-aware hypergraph representation. It introduces a memory-anchored mechanism to explicitly capture cross-molecular higher-order associations and invariant features. By integrating multimodal collaborative encoding, hypergraph neural networks, and memory-enhanced contrastive learning, MMSA achieves state-of-the-art performance on the MoleculeNet benchmark, improving average ROC-AUC by 1.8–9.6% across multiple tasks. The framework significantly enhances generalization capability on downstream molecular property prediction tasks.

Technology Category

Application Category

📝 Abstract

Accurate extraction of molecular representations is a critical step in the drug discovery process. In recent years, significant progress has been made in molecular representation learning methods, among which multi-modal molecular representation methods based on images, and 2D/3D topologies have become increasingly mainstream. However, existing these multi-modal approaches often directly fuse information from different modalities, overlooking the potential of intermodal interactions and failing to adequately capture the complex higher-order relationships and invariant features between molecules. To overcome these challenges, we propose a structure-awareness-based multi-modal self-supervised molecular representation pre-training framework (MMSA) designed to enhance molecular graph representations by leveraging invariant knowledge between molecules. The framework consists of two main modules: the multi-modal molecular representation learning module and the structure-awareness module. The multi-modal molecular representation learning module collaboratively processes information from different modalities of the same molecule to overcome intermodal differences and generate a unified molecular embedding. Subsequently, the structure-awareness module enhances the molecular representation by constructing a hypergraph structure to model higher-order correlations between molecules. This module also introduces a memory mechanism for storing typical molecular representations, aligning them with memory anchors in the memory bank to integrate invariant knowledge, thereby improving the model generalization ability. Extensive experiments have demonstrated the effectiveness of MMSA, which achieves state-of-the-art performance on the MoleculeNet benchmark, with average ROC-AUC improvements ranging from 1.8% to 9.6% over baseline methods.

Problem

Research questions and friction points this paper is trying to address.

Overcoming intermodal differences in molecular representation learning

Capturing higher-order relationships between molecules effectively

Improving model generalization with invariant molecular knowledge

Innovation

Methods, ideas, or system contributions that make the work stand out.

Structure-awareness-based multi-modal self-supervised framework

Hypergraph structure models higher-order molecular correlations

Memory mechanism integrates invariant molecular knowledge

🔎 Similar Papers

No similar papers found.

Authors to Follow