MMM4Rec: A Transfer-Efficient Framework for Multi-modal Sequential Recommendation

📅 2025-06-03

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

To address high fine-tuning costs and negative transfer in cross-domain multimodal sequential recommendation, this paper proposes a lightweight and robust transfer learning framework. Methodologically, it introduces: (1) an algebraic constraint mechanism to enforce cross-domain semantic consistency; (2) a Cross-SSD temporal fusion module leveraging state space models (SSMs) to capture long-range dependencies; (3) dual-channel Fourier-adaptive filtering to suppress cross-modal noise propagation; and (4) shared projection with two-stage constrained optimization for cross-modal alignment. Evaluated on standard cross-domain benchmarks, the framework achieves a 31.78% improvement in NDCG@10 and accelerates fine-tuning convergence by 10×, significantly outperforming existing state-of-the-art methods.

Technology Category

Application Category

📝 Abstract

Sequential Recommendation (SR) systems model user preferences by analyzing interaction histories. Although transferable multi-modal SR architectures demonstrate superior performance compared to traditional ID-based approaches, current methods incur substantial fine-tuning costs when adapting to new domains due to complex optimization requirements and negative transfer effects - a significant deployment bottleneck that hinders engineers from efficiently repurposing pre-trained models for novel application scenarios with minimal tuning overhead. We propose MMM4Rec (Multi-Modal Mamba for Sequential Recommendation), a novel multi-modal SR framework that incorporates a dedicated algebraic constraint mechanism for efficient transfer learning. By combining State Space Duality (SSD)'s temporal decay properties with a time-aware modeling design, our model dynamically prioritizes key modality information, overcoming limitations of Transformer-based approaches. The framework implements a constrained two-stage process: (1) sequence-level cross-modal alignment via shared projection matrices, followed by (2) temporal fusion using our newly designed Cross-SSD module and dual-channel Fourier adaptive filtering. This architecture maintains semantic consistency while suppressing noise propagation.MMM4Rec achieves rapid fine-tuning convergence with simple cross-entropy loss, significantly improving multi-modal recommendation accuracy while maintaining strong transferability. Extensive experiments demonstrate MMM4Rec's state-of-the-art performance, achieving the maximum 31.78% NDCG@10 improvement over existing models and exhibiting 10 times faster average convergence speed when transferring to large-scale downstream datasets.

Problem

Research questions and friction points this paper is trying to address.

Reducing fine-tuning costs for multi-modal sequential recommendation models

Overcoming negative transfer effects in model adaptation

Improving transfer efficiency while maintaining recommendation accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Algebraic constraint mechanism for efficient transfer

State Space Duality with time-aware modeling

Cross-SSD module and dual-channel Fourier filtering

🔎 Similar Papers

No similar papers found.