QG-SMS: Enhancing Test Item Analysis via Student Modeling and Simulation

📅 2025-03-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing question generation (QG) evaluation methods lack alignment with psychometric metrics, failing to reflect true item quality across dimensions such as topic coverage, difficulty, discrimination, and distractor efficiency. Method: This paper introduces Classical Test Theory (CTT) into QG evaluation for the first time. We construct item pairs exhibiting significant quality differences and propose QG-SMS—a large language model (LLM)-based student modeling and simulation framework for interpretable, multi-dimensional automatic assessment. QG-SMS integrates LLM-driven student behavioral modeling, simulated response generation, CTT-based metric computation, and human validation. Contribution/Results: Experiments demonstrate that QG-SMS substantially improves the discriminative accuracy and robustness of QG systems in evaluating educational item quality. Its assessments strongly correlate with actual student performance and outperform conventional automated metrics.

Technology Category

Application Category

📝 Abstract
While the Question Generation (QG) task has been increasingly adopted in educational assessments, its evaluation remains limited by approaches that lack a clear connection to the educational values of test items. In this work, we introduce test item analysis, a method frequently used by educators to assess test question quality, into QG evaluation. Specifically, we construct pairs of candidate questions that differ in quality across dimensions such as topic coverage, item difficulty, item discrimination, and distractor efficiency. We then examine whether existing QG evaluation approaches can effectively distinguish these differences. Our findings reveal significant shortcomings in these approaches with respect to accurately assessing test item quality in relation to student performance. To address this gap, we propose a novel QG evaluation framework, QG-SMS, which leverages Large Language Model for Student Modeling and Simulation to perform test item analysis. As demonstrated in our extensive experiments and human evaluation study, the additional perspectives introduced by the simulated student profiles lead to a more effective and robust assessment of test items.
Problem

Research questions and friction points this paper is trying to address.

Evaluating test item quality in educational assessments
Identifying shortcomings in existing QG evaluation methods
Proposing QG-SMS for enhanced test item analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates test item analysis into QG evaluation
Uses Large Language Model for student simulation
Enhances assessment with simulated student profiles
🔎 Similar Papers
No similar papers found.
B
Bang Nguyen
University of Notre Dame
T
Tingting Du
University of Wisconsin-Madison
Mengxia Yu
Mengxia Yu
PhD student, University of Notre Dame
Natual Language ProcessingLarge Language Models
Lawrence Angrave
Lawrence Angrave
University of Illinois at Urbana-Champaign
M
Meng Jiang
University of Notre Dame