Bridging Short Videos and Live Streams: Reasoning-Guided Multimodal LLMs for Cross-Domain Representation Learning

πŸ“… 2026-06-03
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

197K/year
πŸ€– AI Summary
This work addresses the cold-start problem in live-stream recommendation caused by sparse user behavior by proposing the RGCD-Rep framework, which pioneers the use of reasoning-guided multimodal large language models (MLLMs) for cross-domain interest transfer from short videos to live streams. The approach leverages a frozen teacher MLLM to generate structured reasoning knowledge and employs a lightweight student MLLM trained in two stages to achieve efficient knowledge distillation and decompose representations into transferable and domain-specific residual components. Designed for industrial deployment, the framework supports offline representation computation and seamless integration with retrieval systems. Extensive experiments demonstrate that the method significantly outperforms baseline approaches in offline evaluations and achieves substantial gains across multiple key metrics in A/B tests on Kuaishou’s live-stream recommendation system, serving over 400 million users daily.
πŸ“ Abstract
As live streaming services grow, many platforms offer short videos and live streams to meet diverse needs. Short videos carry substantial traffic and rich behavior signals, whereas live streaming is a core conversion scenario with sparse behavior data, making cold start severe. Transferring user interests from short videos to live streaming recommendation can alleviate these issues. Meanwhile, short videos and live streams are complex multimodal items, and integrating multimodal signals improves recommendation performance. Although Multimodal Large Language Models (MLLMs) show strong multimodal understanding and reasoning, their application to cross-domain recommendation remains underexplored. To this end, we propose Reasoning-Guided Cross-Domain Representation Learning (RGCD-Rep), a reasoning-guided framework for cross-domain recommendation from short videos to live streams. RGCD-Rep introduces MLLM reasoning resource-efficiently and learns transferable item representations guided by behavioral collaboration via two-stage training. First, reasoning-aware distillation lets a frozen teacher MLLM generate structured cross-domain reasoning knowledge and distills it into a lightweight student MLLM. Second, transferability-guided cross-domain representation learning decomposes item representations into transferable and domain residual representations. The resulting representations are computed offline and integrated into downstream retrieval tasks, enabling low-cost industrial deployment. Extensive offline experiments demonstrate RGCD-Rep's superiority. After deployment in Kuaishou's live streaming recommendation system, A/B tests show significant gains across multiple core business metrics, confirming its effectiveness and practicality in real industrial scenarios. RGCD-Rep is fully deployed and serves over 400 million users daily.
Problem

Research questions and friction points this paper is trying to address.

cross-domain recommendation
cold start
multimodal learning
live streaming
short videos
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal Large Language Models
Cross-Domain Recommendation
Reasoning-Guided Learning
Knowledge Distillation
Representation Decomposition
πŸ”Ž Similar Papers