LLM-Based User Personas for Recommendations at Scale

📅 2026-06-10

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the limitations of traditional recommender systems, which rely on structured IDs or offline processing and struggle to simultaneously achieve semantic richness, real-time adaptability, and interpretability. For the first time, it implements a real-time, large language model (LLM)-based natural language user interest profiling system at billion-user scale in a video recommendation platform. The approach dynamically integrates historical interest summaries with exploration of novel topics during serving, effectively balancing exploitation and exploration in recommendations. An efficient and cost-effective online LLM inference architecture is enabled through knowledge distillation, asynchronous inference, and semantically clustered video representations. Comprehensive evaluations—including offline metrics, user studies, and online A/B tests—demonstrate significant improvements in user value, confirming the method’s effectiveness and scalability.

📝 Abstract

Large Language Models (LLMs) offer unprecedented potential for enhancing recommendation systems through their world knowledge and reasoning capabilities. However, existing approaches often rely on structured IDs or offline processing, limiting semantic richness, real-time adaptability, and user-facing interpretability. In this paper, we introduce a novel framework that enables real-time generation of LLM-based user interest personas for a large-scale commercial video recommendation platform. Our method generates natural-language user interest personas that address the exploitation-exploration trade-off by combining the summarization of existing interests with novel topics, directly during serving. To overcome the computational challenges of online LLM inference at a billion-user scale, we design a cost-efficient architecture leveraging knowledge distillation, asynchronous inference, and input optimization via semantically clustered video representations. Extensive offline evaluations, user studies, and live A/B tests demonstrate significant improvements in viewer value. This work bridges the gap between high-level semantic understanding and industrial-scale recommendation, paving the way for more dynamic, explainable, and satisfying personalized experiences.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Recommendation Systems

User Personas

Real-time Adaptability

Semantic Richness

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based user personas

real-time recommendation

knowledge distillation