π€ AI Summary
This study investigates whether Abstract Meaning Representation (AMR) enhances semantic generalization in Natural Language Inference (NLI) for pretrained language models, systematically comparing fine-tuning versus prompting paradigms. Using large models including GPT-4o, the methodology integrates AMR parsing, multi-stage prompt engineering, and fine-grained ablation analysis. Results show that AMR integration severely degrades generalization under fine-tuning; under prompting, it yields only marginal gainsβyet attribution analysis reveals it amplifies superficial lexical/syntactic discrepancies rather than supporting deep semantic reasoning, leading to frequent misclassification of non-contradictory surface differences as contradictions. This work provides the first empirical evidence that AMR can induce semantic reasoning bias in NLI, challenging the implicit assumption that explicit semantic structures inherently improve inference performance. It delivers critical methodological insights into the alignment between semantic representation and reasoning, cautioning against uncritical adoption of structured meaning representations in downstream reasoning tasks.
π Abstract
Natural Language Inference (NLI) relies heavily on adequately parsing the semantic content of the premise and hypothesis. In this work, we investigate whether adding semantic information in the form of an Abstract Meaning Representation (AMR) helps pretrained language models better generalize in NLI. Our experiments integrating AMR into NLI in both fine-tuning and prompting settings show that the presence of AMR in fine-tuning hinders model generalization while prompting with AMR leads to slight gains in exttt{GPT-4o}. However, an ablation study reveals that the improvement comes from amplifying surface-level differences rather than aiding semantic reasoning. This amplification can mislead models to predict non-entailment even when the core meaning is preserved.