🤖 AI Summary
Defining and generating pedagogically effective questions—those yielding measurable learning gains—remains a fundamental challenge in educational AI.
Method: We propose QUEST, a framework that (1) formally defines and empirically estimates question utility based on real-world learning outcomes (e.g., post-intervention exam score improvements); (2) constructs an LLM-driven learning environment simulator to enable computationally tractable utility evaluation; and (3) introduces a utility-driven rejection sampling fine-tuning paradigm, replacing conventional approaches reliant on pedagogical heuristics or indirect proxies (e.g., information gain).
Contribution/Results: Experiments demonstrate that QUEST-generated questions yield average exam score improvements exceeding 20%, significantly outperforming state-of-the-art baselines. This work establishes a novel, quantifiable, and optimization-friendly paradigm for learning-oriented question generation.
📝 Abstract
Asking questions is a fundamental aspect of learning that facilitates deeper understanding. However, characterizing and crafting questions that effectively improve learning remains elusive. To address this gap, we propose QUEST (Question Utility Estimation with Simulated Tests). QUEST simulates a learning environment that enables the quantification of a question's utility based on its direct impact on improving learning outcomes. Furthermore, we can identify high-utility questions and use them to fine-tune question generation models with rejection sampling. We find that questions generated by models trained with rejection sampling based on question utility result in exam scores that are higher by at least 20% than those from specialized prompting grounded on educational objectives literature and models fine-tuned with indirect measures of question quality, such as saliency and expected information gain.