HuAMR: A Hungarian AMR Parser and Dataset

📅 2025-02-27

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

Severe scarcity of semantic resources for non-English languages—particularly Hungarian—hampers progress in Abstract Meaning Representation (AMR) research. Method: This work introduces HuAMR, the first high-quality Hungarian AMR dataset, constructed via a novel “large language model (LLM) generation + human refinement” paradigm: initial AMR graphs are generated using Llama-3.1-70B and rigorously validated and corrected by linguists. We further develop dedicated AMR parsers based on mT5-Large and Llama-3.2-1B, enhanced through domain-adaptive fine-tuning on Hungarian news text. Contribution/Results: Our parser achieves a new state-of-the-art Smatch F-score on Hungarian news, demonstrating substantial gains over prior approaches. Both the HuAMR dataset and the trained parsers are publicly released, establishing essential infrastructure and a reproducible methodology for AMR development in low-resource languages.

Technology Category

Application Category

📝 Abstract

We present HuAMR, the first Abstract Meaning Representation (AMR) dataset and a suite of large language model-based AMR parsers for Hungarian, targeting the scarcity of semantic resources for non-English languages. To create HuAMR, we employed Llama-3.1-70B to automatically generate silver-standard AMR annotations, which we then refined manually to ensure quality. Building on this dataset, we investigate how different model architectures - mT5 Large and Llama-3.2-1B - and fine-tuning strategies affect AMR parsing performance. While incorporating silver-standard AMRs from Llama-3.1-70B into the training data of smaller models does not consistently boost overall scores, our results show that these techniques effectively enhance parsing accuracy on Hungarian news data (the domain of HuAMR). We evaluate our parsers using Smatch scores and confirm the potential of HuAMR and our parsers for advancing semantic parsing research.

Problem

Research questions and friction points this paper is trying to address.

Addressing scarcity of semantic resources for non-English languages.

Developing AMR dataset and parsers for Hungarian language.

Evaluating model architectures and fine-tuning strategies for AMR parsing.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Llama-3.1-70B generates silver-standard AMR annotations

Manual refinement ensures high-quality AMR dataset

Fine-tuning mT5 Large and Llama-3.2-1B improves parsing

🔎 Similar Papers

No similar papers found.