Question Type, Cognitive Load, and CEFR Alignment: Evaluating LLM-Generated EFL Grammar Drill Exercises

📅 2026-05-31

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

This study evaluates the pedagogical effectiveness of large language model (LLM)-generated grammar exercises for English as a foreign language (EFL) learners, with a focus on how question formats influence cognitive load and the alignment between CEFR-J difficulty levels and actual task demands. Drawing on real-world interaction logs from Japanese middle school students using a grammar practice application, the research integrates learning analytics, cognitive load theory, and the CEFR-J framework to systematically compare the impact of multiple-choice, fill-in-the-blank, and drag-and-drop items on learning outcomes. Findings indicate that multiple-choice questions impose the lowest cognitive load, fill-in-the-blank tasks least support active recall, and drag-and-drop items require the longest response time; furthermore, CEFR-J ratings show strong consistency with student accuracy and response latency. This work provides the first empirical validation of the instructional viability of LLM-generated materials and underscores the critical role of item format in fostering language production skills.

📝 Abstract

This study evaluates the pedagogical viability of LLM-generated English as a Foreign Language (EFL) learning content. Utilising log data from Japanese junior high school students practicing on a grammar drilling application, we analysed how different question modalities impact student performance and whether theoretical localised CEFR difficulty tiers accurately predict empirical task difficulty. Results reveal a clear performance hierarchy: multiple-choice questions carried the lowest cognitive load, cloze tasks posed the greatest barrier to active recall, and drag-and-drop exercises incurred the heaviest time penalties. Furthermore, learner data validated the CEFR-J grammar framework, showing a steady decline in accuracy and increased response times as proficiency levels advanced. These findings demonstrate that LLMs can successfully generate learning content, while highlighting the need for developers to strategically sequence question modalities to transition learners from passive recognition to active linguistic production.

Problem

Research questions and friction points this paper is trying to address.

LLM-generated content

EFL grammar exercises

question modality

cognitive load

CEFR alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-generated content

cognitive load

question modality