🤖 AI Summary
Traditional lexical semantic representations struggle to capture implicit dimensions such as the scenes, atmospheres, and emotions evoked by words in specific contexts. This work proposes the Scene Abstraction framework, which introduces a structured representation for context-sensitive word meanings by integrating contextual scenes—comprising events, entities, and environments—with expressive profiles that include associated events, generalized attributes, and elicited emotions. Leveraging few-shot prompting with large language models, the framework extracts embodied semantics directly from contextual usage. Based on this approach, we construct the COCA-Scenes dataset and conduct human evaluations demonstrating that annotators achieve 82.4% accuracy in identifying induced scenes—a relative improvement of 11.8 percentage points over standard text embeddings. Furthermore, across three semantic dimensions, 86.4% of participants significantly preferred our method’s generated scene profiles over those produced by the ATOMIC baseline.
📝 Abstract
Coffee and tea share many properties, yet they evoke strikingly different situations, atmospheres, and affective associations. These situated dimensions of word meaning are real and systematic, but they remain implicit in most computational representations of lexical meaning. We propose Scene Abstraction, a framework for constructing structured representations of the interpretive scenes that words participate in across usage contexts. Each scene consists of a Contextual Scene (Events, Entities, Setting) and an expression-centered Expression Profile (Engaged events, Generalizable properties, Evoked emotions), operationalized through few-shot prompting of a large language model. Our contributions are three-fold: (1) a structured representation framework for situated lexical meaning; (2) COCA-Scenes, a dataset of 520 usage instances across 26 keywords for distinct scene identification; and (3) empirical evidence from two experiments suggesting that scenes are reliably identifiable across human observers (82.4% accuracy, +11.8 pp over text-only embeddings) and that our scene profiles more closely align with human interpretation of words in context than ATOMIC-based alternatives (86.4% preference across three semantic dimensions).