🤖 AI Summary
This study addresses the challenge that large language models (LLMs) face in interpreting indirect pragmatic intentions conveyed solely through nonverbal cues—such as facial expressions or gestures—without accompanying linguistic signals. It presents the first systematic evaluation of LLMs’ pragmatic reasoning capabilities under purely nonverbal conditions, introducing an assessment framework that integrates in-context learning with fine-grained error analysis. The findings reveal a performance drop of up to 60 percentage points in nonverbal scenarios compared to verbal ones. Crucially, in-context learning substantially enhances models’ comprehension of nonverbal pragmatic meanings, demonstrating its efficacy and offering a promising avenue for improving LLMs’ social interaction competencies.
📝 Abstract
Although large language models (LLMs) have shown considerable progress in pragmatic language understanding, prior research has focused mainly on their comprehension of verbal behavior. Nonetheless, non-verbal behavior remains a fundamental component of human communication, especially when deliberately utilized in isolation to convey indirect meanings. In this work, we present the first systematic evaluation of LLMs' ability to infer pragmatic meaning in dialogue consisting solely of non-verbal responses. We explore three research questions: (1) Can LLMs recognize indirect intent conveyed through non-verbal responses? (2) When and how do LLMs fail to capture non-verbal intent? (3) How can we improve LLMs' ability to interpret non-verbal intent?. Through the evaluation, we observe that LLMs struggle to infer underlying meaning from non-verbal responses, with accuracy dropping by up to 60% points compared to verbal ones. Further extensive analysis reveals a behavioral pattern in LLMs' interpretations of non-verbal behavior and demonstrates that in-context learning facilitates pragmatic inference.