What is a Good Question? Utility Estimation with LLM-based Simulations

📅 2025-02-24

📈 Citations: 0

✨ Influential: 0

career value

155K/year

🤖 AI Summary

Defining and generating pedagogically effective questions—those yielding measurable learning gains—remains a fundamental challenge in educational AI. Method: We propose QUEST, a framework that (1) formally defines and empirically estimates question utility based on real-world learning outcomes (e.g., post-intervention exam score improvements); (2) constructs an LLM-driven learning environment simulator to enable computationally tractable utility evaluation; and (3) introduces a utility-driven rejection sampling fine-tuning paradigm, replacing conventional approaches reliant on pedagogical heuristics or indirect proxies (e.g., information gain). Contribution/Results: Experiments demonstrate that QUEST-generated questions yield average exam score improvements exceeding 20%, significantly outperforming state-of-the-art baselines. This work establishes a novel, quantifiable, and optimization-friendly paradigm for learning-oriented question generation.

Technology Category

Application Category

📝 Abstract

Asking questions is a fundamental aspect of learning that facilitates deeper understanding. However, characterizing and crafting questions that effectively improve learning remains elusive. To address this gap, we propose QUEST (Question Utility Estimation with Simulated Tests). QUEST simulates a learning environment that enables the quantification of a question's utility based on its direct impact on improving learning outcomes. Furthermore, we can identify high-utility questions and use them to fine-tune question generation models with rejection sampling. We find that questions generated by models trained with rejection sampling based on question utility result in exam scores that are higher by at least 20% than those from specialized prompting grounded on educational objectives literature and models fine-tuned with indirect measures of question quality, such as saliency and expected information gain.

Problem

Research questions and friction points this paper is trying to address.

Estimating question utility for learning

Simulating learning environments with LLM

Fine-tuning question generation models

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based simulations

Utility estimation

Rejection sampling

🔎 Similar Papers

Ranking Generated Answers: On the Agreement of Retrieval Models with Humans on Consumer Health Questions