🤖 AI Summary
This study investigates how simplification instructions affect semantic completeness and user comprehension in large language models’ (LLMs) definitions of homonyms. We identify a critical problem: over-simplification frequently omits essential senses, leading to misinterpretation. To address this, we construct the first multilingual evaluation dataset specifically designed for assessing homonym definition quality and empirically demonstrate that simplification prompts significantly degrade models’ sense coverage. Methodologically, we innovatively adapt Direct Preference Optimization (DPO) to the definition generation task and fine-tune Llama 3.1 8B accordingly. Results show that DPO-finetuned models substantially improve multi-sense identification and balanced expression across diverse prompting conditions, outperforming baselines significantly in both LLM-as-Judge and human evaluations. Our approach provides a scalable, preference-driven technical pathway to enhance lexical definition accuracy and user-centered adaptability.
📝 Abstract
Large Language Models (LLMs) can provide accurate word definitions and explanations for any context. However, the scope of the definition changes for different target groups, like children or language learners. This is especially relevant for homonyms, words with multiple meanings, where oversimplification might risk information loss by omitting key senses, potentially misleading users who trust LLM outputs. We investigate how simplification impacts homonym definition quality across three target groups: Normal, Simple, and ELI5. Using two novel evaluation datasets spanning multiple languages, we test DeepSeek v3, Llama 4 Maverick, Qwen3-30B A3B, GPT-4o mini, and Llama 3.1 8B via LLM-as-Judge and human annotations. Our results show that simplification drastically degrades definition completeness by neglecting polysemy, increasing the risk of misunderstanding. Fine-tuning Llama 3.1 8B with Direct Preference Optimization substantially improves homonym response quality across all prompt types. These findings highlight the need to balance simplicity and completeness in educational NLP to ensure reliable, context-aware definitions for all learners.