BenSyc: Benchmarking Conversational Sycophancy and Human Alignment in LLMs for Bengali Contexts

📅 2026-06-08

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the lack of systematic evaluation of conversational sycophancy in non-English cultural contexts. It introduces BenSyc, the first benchmark for conversational sycophancy tailored to Bengali social settings, constructed from 11,840 Reddit posts and 170,000 comments, manually annotated with both binary labels and a fine-grained five-point scale (disapproval, neutral, supportive, endorsing, and escalating). The work also proposes the first evaluation framework for conversational sycophancy applicable to non-English languages. Systematic assessment of over 15 mainstream large language models reveals limited capability in distinguishing between empathetic support and excessive endorsement. The best-performing model achieves Macro-F1 scores of 61.8 and 61.7 on binary and five-class classification tasks, respectively, while generation experiments show a tendency to produce overly endorsing responses in emotionally charged scenarios, highlighting a significant gap in cultural alignment.

📝 Abstract

Large language models (LLMs) increasingly participate in emotionally sensitive social conversations, where responses may shift from balanced support toward excessive validation or escalatory alignment. Existing sycophancy research primarily focuses on factual agreement and instruction-following settings, leaving culturally grounded conversational sycophancy underexplored. We introduce BenSyc, the first benchmark for studying conversational sycophancy in Bengali social contexts. Starting from 11,840 Reddit posts and 170k comments collected from communities across Bangladesh and West Bengal, we construct a human-validated benchmark with binary labels and a fine-grained five-level taxonomy spanning Invalidation, Neutral, Support, Validation, and Escalation. We evaluate more than 15 open and proprietary LLMs on conversational alignment classification and response generation tasks. Results show that distinguishing empathetic support from reinforcement-oriented validation remains challenging even for frontier instruction-tuned models: the best system achieves only 61.8 Macro-F1 on binary detection and 61.7 Macro-F1 on five-class classification. In generation settings, several models frequently produce strongly validating or escalatory responses in emotionally charged situations. Our findings highlight substantial variation across model families and conversational behaviors, underscoring the importance of culturally grounded multilingual benchmarks for evaluating socially aligned conversational AI systems.

Problem

Research questions and friction points this paper is trying to address.

conversational sycophancy

human alignment

Bengali contexts

emotionally sensitive conversations

multilingual benchmarks

Innovation

Methods, ideas, or system contributions that make the work stand out.

conversational sycophancy

culturally grounded benchmark

Bengali NLP