Reasoning LLMs for User-Aware Multimodal Conversational Agents

📅 2025-04-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the cold-start problem—i.e., unknown user preferences during initial interaction—in social robotics for elderly users, this paper proposes a zero-shot user-aware dialogue framework. Methodologically, it introduces a novel dynamic user profiling mechanism: a vision-language model (VLM) initializes the profile from the first multimodal input frame; chain-of-thought (CoT) reasoning iteratively refines it; and retrieval-augmented generation (RAG) enables context-adaptive responses. Additionally, a privacy-preserving federated prompting framework ensures data security. Evaluated on ElderlyTech-VQA Bench, the approach achieves a 23.2% improvement in ROUGE-1 score. Human evaluations demonstrate significant gains in elderly users’ trust, engagement duration, and a 37% reduction in response bias. This work establishes a scalable, interpretable, and user-friendly paradigm for personalized human–robot interaction under cold-start conditions.

Technology Category

Application Category

📝 Abstract
Personalization in social robotics is critical for fostering effective human-robot interactions, yet systems often face the cold start problem, where initial user preferences or characteristics are unavailable. This paper proposes a novel framework called USER-LLM R1 for a user-aware conversational agent that addresses this challenge through dynamic user profiling and model initiation. Our approach integrates chain-of-thought (CoT) reasoning models to iteratively infer user preferences and vision-language models (VLMs) to initialize user profiles from multimodal inputs, enabling personalized interactions from the first encounter. Leveraging a Retrieval-Augmented Generation (RAG) architecture, the system dynamically refines user representations within an inherent CoT process, ensuring contextually relevant and adaptive responses. Evaluations on the ElderlyTech-VQA Bench demonstrate significant improvements in ROUGE-1 (+23.2%), ROUGE-2 (+0.6%), and ROUGE-L (+8%) F1 scores over state-of-the-art baselines, with ablation studies underscoring the impact of reasoning model size on performance. Human evaluations further validate the framework's efficacy, particularly for elderly users, where tailored responses enhance engagement and trust. Ethical considerations, including privacy preservation and bias mitigation, are rigorously discussed and addressed to ensure responsible deployment.
Problem

Research questions and friction points this paper is trying to address.

Addresses cold start in social robotics via dynamic user profiling
Integrates reasoning and vision models for personalized interactions
Enhances engagement with tailored responses for elderly users
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic user profiling with CoT reasoning models
Vision-language models for initial user profiles
Retrieval-Augmented Generation for adaptive responses
🔎 Similar Papers
No similar papers found.