When Does Meaning Backfire? Investigating the Role of AMRs in NLI

📅 2025-06-17

📈 Citations: 0

✨ Influential: 0

career value

148K/year

🤖 AI Summary

This study investigates whether Abstract Meaning Representation (AMR) enhances semantic generalization in Natural Language Inference (NLI) for pretrained language models, systematically comparing fine-tuning versus prompting paradigms. Using large models including GPT-4o, the methodology integrates AMR parsing, multi-stage prompt engineering, and fine-grained ablation analysis. Results show that AMR integration severely degrades generalization under fine-tuning; under prompting, it yields only marginal gains—yet attribution analysis reveals it amplifies superficial lexical/syntactic discrepancies rather than supporting deep semantic reasoning, leading to frequent misclassification of non-contradictory surface differences as contradictions. This work provides the first empirical evidence that AMR can induce semantic reasoning bias in NLI, challenging the implicit assumption that explicit semantic structures inherently improve inference performance. It delivers critical methodological insights into the alignment between semantic representation and reasoning, cautioning against uncritical adoption of structured meaning representations in downstream reasoning tasks.

Technology Category

Application Category

📝 Abstract

Natural Language Inference (NLI) relies heavily on adequately parsing the semantic content of the premise and hypothesis. In this work, we investigate whether adding semantic information in the form of an Abstract Meaning Representation (AMR) helps pretrained language models better generalize in NLI. Our experiments integrating AMR into NLI in both fine-tuning and prompting settings show that the presence of AMR in fine-tuning hinders model generalization while prompting with AMR leads to slight gains in exttt{GPT-4o}. However, an ablation study reveals that the improvement comes from amplifying surface-level differences rather than aiding semantic reasoning. This amplification can mislead models to predict non-entailment even when the core meaning is preserved.

Problem

Research questions and friction points this paper is trying to address.

Investigates AMR impact on NLI generalization

Examines AMR in fine-tuning vs prompting

Reveals AMR amplifies surface-level differences

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrating AMR into NLI fine-tuning

Prompting with AMR for GPT-4o gains

Amplifying surface-level differences via AMR

🔎 Similar Papers

No similar papers found.