Semantic Item Graph Enhancement for Multimodal Recommendation

📅 2025-08-08

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

Existing multimodal recommendation methods construct modality-specific semantic graphs suffering from two key limitations: insufficient modeling of item-level collaborative signals and structural distortion induced by raw modality noise. To address these issues, we propose the Collaborative Enhancement and Anti-noise Alignment (CEAA) framework. First, it injects collaborative signals from the user-item interaction graph into modality-specific semantic graphs to alleviate semantic sparsity. Second, it introduces a modality-guided personalized embedding perturbation mechanism, coupled with anchor-based two-stage representation alignment and anti-noise contrastive learning (InfoNCE), to jointly optimize cross-modal and cross-view consistency. Extensive experiments on four benchmark datasets demonstrate that CEAA significantly improves both recommendation accuracy and robustness, consistently outperforming state-of-the-art methods across all evaluation metrics.

Technology Category

Application Category

📝 Abstract

Multimodal recommendation systems have attracted increasing attention for their improved performance by leveraging items' multimodal information. Prior methods often build modality-specific item-item semantic graphs from raw modality features and use them as supplementary structures alongside the user-item interaction graph to enhance user preference learning. However, these semantic graphs suffer from semantic deficiencies, including (1) insufficient modeling of collaborative signals among items and (2) structural distortions introduced by noise in raw modality features, ultimately compromising performance. To address these issues, we first extract collaborative signals from the interaction graph and infuse them into each modality-specific item semantic graph to enhance semantic modeling. Then, we design a modulus-based personalized embedding perturbation mechanism that injects perturbations with modulus-guided personalized intensity into embeddings to generate contrastive views. This enables the model to learn noise-robust representations through contrastive learning, thereby reducing the effect of structural noise in semantic graphs. Besides, we propose a dual representation alignment mechanism that first aligns multiple semantic representations via a designed Anchor-based InfoNCE loss using behavior representations as anchors, and then aligns behavior representations with the fused semantics by standard InfoNCE, to ensure representation consistency. Extensive experiments on four benchmark datasets validate the effectiveness of our framework.

Problem

Research questions and friction points this paper is trying to address.

Enhancing item semantic graphs with collaborative signals

Reducing structural noise in semantic graphs via contrastive learning

Aligning multiple semantic and behavior representations consistently

Innovation

Methods, ideas, or system contributions that make the work stand out.

Infusing collaborative signals into modality-specific graphs

Modulus-based personalized embedding perturbation mechanism

Dual representation alignment with Anchor-based InfoNCE

🔎 Similar Papers

No similar papers found.