FACE: A Fine-grained Reference Free Evaluator for Conversational Recommender Systems

📅 2025-05-30

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Automated evaluation of Conversational Recommender Systems (CRS) suffers from weak dynamic modeling, heavy reliance on human-written reference responses, and poor interpretability. To address these challenges, we propose FACE—a reference-free, fine-grained, multi-dimensional automatic evaluation framework—that enables the first decoupled assessment of recommendation quality at both turn-level and dialogue-level. Methodologically, FACE integrates large language model (LLM)-driven multi-faceted prompt engineering with a zero-shot evaluation decomposition mechanism, eliminating dependence on gold-standard responses. Experiments demonstrate that FACE achieves system-level correlation of 0.9—significantly outperforming state-of-the-art methods—along with turn-level and dialogue-level correlations of 0.5. It further supports precise error attribution and localization while reducing evaluation cost by over 90%. This work breaks the “black-box” limitation of LLM-based evaluation, establishing a new paradigm for CRS assessment that is highly reliable, cost-efficient, and interpretable.

Technology Category

Application Category

📝 Abstract

A systematic, reliable, and low-cost evaluation of Conversational Recommender Systems (CRSs) remains an open challenge. Existing automatic CRS evaluation methods are proven insufficient for evaluating the dynamic nature of recommendation conversations. This work proposes FACE: a Fine-grained, Aspect-based Conversation Evaluation method that provides evaluation scores for diverse turn and dialogue level qualities of recommendation conversations. FACE is reference-free and shows strong correlation with human judgments, achieving system correlation of 0.9 and turn/dialogue-level of 0.5, outperforming state-of-the-art CRS evaluation methods by a large margin. Additionally, unlike existing LLM-based methods that provide single uninterpretable scores, FACE provides insights into the system performance and enables identifying and locating problems within conversations.

Problem

Research questions and friction points this paper is trying to address.

Evaluates Conversational Recommender Systems dynamically and reliably

Provides fine-grained, aspect-based scores for conversation quality

Identifies system performance issues without human references

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-grained aspect-based conversation evaluation method

Reference-free with strong human judgment correlation

Provides interpretable scores and problem identification

🔎 Similar Papers

No similar papers found.