Enhancing Hindi NER in Low Context: A Comparative study of Transformer-based models with vs. without Retrieval Augmentation

📅 2025-07-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limited named entity recognition (NER) performance for low-resource Hindi in low-context settings, this paper proposes a Wikipedia-based retrieval-augmented approach that dynamically injects external knowledge into input sequences. We systematically evaluate multiple Transformer architectures—including MuRIL, XLM-R, Llama variants, and GPT-3.5-turbo—under a joint retrieval-augmentation and fine-tuning paradigm. Experimental results demonstrate substantial improvements in low-context NER: MuRIL’s macro-F1 rises from 0.69 to 0.70, while XLM-R achieves a marked gain from 0.495 to 0.71. Crucially, this work provides the first empirical validation of differential effectiveness of retrieval augmentation across diverse large language model paradigms (encoder-only vs. generative) for low-resource language NER. It further reveals that data augmentation strategies must be carefully aligned with underlying model architectures to maximize performance gains.

Technology Category

Application Category

📝 Abstract
One major challenge in natural language processing is named entity recognition (NER), which identifies and categorises named entities in textual input. In order to improve NER, this study investigates a Hindi NER technique that makes use of Hindi-specific pretrained encoders (MuRIL and XLM-R) and Generative Models ( Llama-2-7B-chat-hf (Llama2-7B), Llama-2-70B-chat-hf (Llama2-70B), Llama-3-70B-Instruct (Llama3-70B) and GPT3.5-turbo), and augments the data with retrieved data from external relevant contexts, notably from Wikipedia. We have fine-tuned MuRIL, XLM-R and Llama2-7B with and without RA. However, Llama2-70B, lama3-70B and GPT3.5-turbo are utilised for few-shot NER generation. Our investigation shows that the mentioned language models (LMs) with Retrieval Augmentation (RA) outperform baseline methods that don't incorporate RA in most cases. The macro F1 scores for MuRIL and XLM-R are 0.69 and 0.495, respectively, without RA and increase to 0.70 and 0.71, respectively, in the presence of RA. Fine-tuned Llama2-7B outperforms Llama2-7B by a significant margin. On the other hand the generative models which are not fine-tuned also perform better with augmented data. GPT3.5-turbo adopted RA well; however, Llama2-70B and llama3-70B did not adopt RA with our retrieval context. The findings show that RA significantly improves performance, especially for low-context data. This study adds significant knowledge about how best to use data augmentation methods and pretrained models to enhance NER performance, particularly in languages with limited resources.
Problem

Research questions and friction points this paper is trying to address.

Improving Hindi NER using transformer models and retrieval augmentation
Comparing performance of models with vs. without external data augmentation
Enhancing low-context NER for resource-limited languages like Hindi
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Hindi-specific pretrained encoders like MuRIL, XLM-R
Augments data with retrieved Wikipedia contexts
Fine-tunes models with and without Retrieval Augmentation
🔎 Similar Papers
No similar papers found.