Pragmatics in the Era of Large Language Models: A Survey on Datasets, Evaluation, Opportunities and Challenges

📅 2025-02-17

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing evaluations of large language models’ pragmatic competence—such as implicature interpretation and coreference resolution—lack systematic coverage and depth. Method: We conduct a comprehensive survey of 127 pragmatic datasets and propose a three-dimensional mapping framework linking pragmatic phenomena, datasets, and evaluation dimensions; perform cross-dataset meta-analysis and task-paradigm categorization; and establish the first pragmatics evaluation standard grounded in real-world scenario adaptability. Contribution/Results: Our analysis reveals critical gaps in current benchmarks—particularly in dynamic context modeling and cross-cultural pragmatic inference—and identifies three fundamental evaluation bottlenecks. The study advances a next-generation pragmatic evaluation framework that is fine-grained, context-sensitive, and scalable, thereby providing both theoretical foundations and practical pathways for developing context-aware NLP models.

Technology Category

Application Category

📝 Abstract

Understanding pragmatics-the use of language in context-is crucial for developing NLP systems capable of interpreting nuanced language use. Despite recent advances in language technologies, including large language models, evaluating their ability to handle pragmatic phenomena such as implicatures and references remains challenging. To advance pragmatic abilities in models, it is essential to understand current evaluation trends and identify existing limitations. In this survey, we provide a comprehensive review of resources designed for evaluating pragmatic capabilities in NLP, categorizing datasets by the pragmatics phenomena they address. We analyze task designs, data collection methods, evaluation approaches, and their relevance to real-world applications. By examining these resources in the context of modern language models, we highlight emerging trends, challenges, and gaps in existing benchmarks. Our survey aims to clarify the landscape of pragmatic evaluation and guide the development of more comprehensive and targeted benchmarks, ultimately contributing to more nuanced and context-aware NLP models.

Problem

Research questions and friction points this paper is trying to address.

Evaluating pragmatic abilities in large language models

Identifying limitations in current pragmatic evaluation methods

Developing comprehensive benchmarks for context-aware NLP

Innovation

Methods, ideas, or system contributions that make the work stand out.

Survey on pragmatic evaluation datasets

Analyze task designs and methods

Highlight trends and benchmark gaps

🔎 Similar Papers

No similar papers found.

Authors to Follow