🤖 AI Summary
This study addresses the limitation of current large language models (LLMs) in handling subjective cultural questions, where evaluation has predominantly emphasized factual accuracy while overlooking how expressive style influences user perception. To bridge this gap, the authors propose FRANZ—a novel framework that systematically audits model responses along four dimensions: cultural stance, generic language use, anthropomorphic cues, and adherence to conversational maxims. They also introduce SQUARE, a large-scale multilingual and multicultural corpus comprising 376,000 subjective question-answer pairs across diverse national and thematic categories. Experimental results reveal significant variation among open-source LLMs in these expressive traits, with in-group cultural alignment positively correlated with anthropomorphism—a relationship whose strength exhibits notable cross-cultural heterogeneity.
📝 Abstract
Large language models (LLMs) are being increasingly used to answer subjective, information-seeking questions, where users are sensitive to how responses are communicated, not just whether the answers are correct. Existing LLM evaluations for subjective cultural queries largely focus on factual correctness, ignoring how the response is framed. To this end, we introduce FRANZ, an automated FRAmework for respoNse characteriZation to conduct communicative audit of LLM responses along four dimensions: cultural positioning, use of generalizing language, anthropomorphic cues, and adherence to conversational maxims. To enable this evaluation, we contribute SQUARE - a corpus of 376k subjective questions sourced from 57 subreddits, and mapped to 7 countries and 19 question categories. We demonstrate FRANZ's applicability by scoring responses from three open-weight LLMs. We observe that LLMs show statistically significant differences in the frequency with which they employ each response characteristic. Unlike single-dimensional audits, FRANZ reveals that insider positioning and anthropomorphism are positively coupled, with the degree of coupling varying by country, providing a diagnostic lens for identifying framing divergences.