LLMs Got Rhythm? Hybrid Phonological Filtering for Greek Poetry Rhyme Detection and Generation

📅 2026-01-14

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This study addresses the challenge of accurately handling phonological rhyme tasks in low-resource languages such as Modern Greek, where large language models (LLMs) exhibit significant limitations. To this end, the work proposes the first hybrid framework that integrates deterministic phonological rules with LLMs, establishing a proxy-based generation pipeline augmented with a phonological validation mechanism. This approach enables precise identification and generation of fine-grained rhyme types, including pure, rich, and imperfect rhymes. Extensive experiments across multiple LLMs—including Claude, GPT-4o, Gemini, Llama, and Mistral—employing zero-shot, few-shot, chain-of-thought, and retrieval-augmented generation prompting strategies reveal that purely LLM-generated verses yield valid rhymes in less than 4% of cases, whereas the proposed hybrid system achieves a success rate of 73.1%. The study also introduces the first high-quality Greek rhyme corpus, comprising over 40,000 cleaned rhyme pairs, and uncovers a “reasoning gap” between model inference capabilities and rhyme performance.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs), despite their remarkable capabilities across NLP tasks, struggle with phonologically-grounded phenomena like rhyme detection and generation. This is even more evident in lower-resource languages such as Modern Greek. In this paper, we present a hybrid system that combines LLMs with deterministic phonological algorithms to achieve accurate rhyme identification/analysis and generation. Our approach implements a comprehensive taxonomy of Greek rhyme types, including Pure, Rich, Imperfect, Mosaic, and Identical Pre-rhyme Vowel (IDV) patterns, and employs an agentic generation pipeline with phonological verification. We evaluate multiple prompting strategies (zero-shot, few-shot, Chain-of-Thought, and RAG-augmented) across several LLMs including Claude 3.7 and 4.5, GPT-4o, Gemini 2.0 and open-weight models like Llama 3.1 8B and 70B and Mistral Large. Results reveal a significant"Reasoning Gap": while native-like models (Claude 3.7) perform intuitively (40\% accuracy in identification), reasoning-heavy models (Claude 4.5) achieve state-of-the-art performance (54\%) only when prompted with Chain-of-Thought. Most critically, pure LLM generation fails catastrophically (under 4\% valid poems), while our hybrid verification loop restores performance to 73.1\%. We release our system and a corpus of 40,000+ rhymes, derived from the Anemoskala and Interwar Poetry corpora, to support future research.

Problem

Research questions and friction points this paper is trying to address.

rhyme detection

rhyme generation

Large Language Models

phonological phenomena

low-resource languages

Innovation

Methods, ideas, or system contributions that make the work stand out.

hybrid phonological filtering

rhyme generation

Large Language Models