LifeSide: Benchmarking Agents as Lifelong Digital Companions

📅 2026-06-03

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

Current evaluation methods struggle to assess digital companion agents’ ability to integrate cross-session cues, maintain consistent user understanding, and adapt to evolving privacy boundaries over long-term interactions. This work proposes the first multidimensional evaluation framework tailored for lifelong digital companionship, introducing a recurrent benchmark that couples memory, affect, and contextual dynamics across multiple sessions. Leveraging multi-agent simulation, users are modeled as persistent personas with hierarchical profiles and event trajectories. Experiments across 2,000 personas and 111,000 tasks reveal that even models excelling on existing memory benchmarks fail to sustain accurate user modeling and authentic companionship over time, exposing fundamental limitations in current systems’ capacity for long-term interaction.

📝 Abstract

Lifelong digital companions must integrate cross-session cues, continually update their understanding of users, and adapt to shifting privacy boundaries. Existing evaluations fail to capture this, testing memory recall and short-term empathy in isolation. To bridge this gap, we introduce \benchmark, a benchmark centered on multi-session \textit{Memory-Emotion-Environment} loops. By modeling users as persistent worlds with layered profiles and event trajectories, \benchmark uses multi-agent simulation to project environmental dynamics into dialogue, preserving the critical gap between latent thoughts and observable expressions. Evaluating 2,000 personas and 111K tasks across memory tracking, user understanding, privacy control, and emotional companionship, our experiment results reveal a stark reality: even models that saturate current memory benchmarks fail to sustain accurate user understanding and true companionship over long horizons.

Problem

Research questions and friction points this paper is trying to address.

lifelong digital companions

cross-session memory

user understanding

privacy boundaries

emotional companionship

Innovation

Methods, ideas, or system contributions that make the work stand out.

lifelong digital companion

multi-session benchmark

Memory-Emotion-Environment loop