🤖 AI Summary
Existing evaluations of large language models’ pragmatic competence—such as implicature interpretation and coreference resolution—lack systematic coverage and depth. Method: We conduct a comprehensive survey of 127 pragmatic datasets and propose a three-dimensional mapping framework linking pragmatic phenomena, datasets, and evaluation dimensions; perform cross-dataset meta-analysis and task-paradigm categorization; and establish the first pragmatics evaluation standard grounded in real-world scenario adaptability. Contribution/Results: Our analysis reveals critical gaps in current benchmarks—particularly in dynamic context modeling and cross-cultural pragmatic inference—and identifies three fundamental evaluation bottlenecks. The study advances a next-generation pragmatic evaluation framework that is fine-grained, context-sensitive, and scalable, thereby providing both theoretical foundations and practical pathways for developing context-aware NLP models.
📝 Abstract
Understanding pragmatics-the use of language in context-is crucial for developing NLP systems capable of interpreting nuanced language use. Despite recent advances in language technologies, including large language models, evaluating their ability to handle pragmatic phenomena such as implicatures and references remains challenging. To advance pragmatic abilities in models, it is essential to understand current evaluation trends and identify existing limitations. In this survey, we provide a comprehensive review of resources designed for evaluating pragmatic capabilities in NLP, categorizing datasets by the pragmatics phenomena they address. We analyze task designs, data collection methods, evaluation approaches, and their relevance to real-world applications. By examining these resources in the context of modern language models, we highlight emerging trends, challenges, and gaps in existing benchmarks. Our survey aims to clarify the landscape of pragmatic evaluation and guide the development of more comprehensive and targeted benchmarks, ultimately contributing to more nuanced and context-aware NLP models.