🤖 AI Summary
This work addresses the limitation of current large language models in open-ended human-AI tutoring, where the absence of structured curricula and prior knowledge tracking hinders effective sustained instruction. The authors propose a novel paradigm that decouples instructional planning from dialogue generation: an explicit prerequisite knowledge graph encodes curriculum structure, while a lightweight PPO-based policy dynamically selects teaching nodes and dialogue turns. A large language model then conducts Socratic dialogues and provides feedback on learning progress. This approach is the first to integrate explicit curriculum orchestration into open-ended learning scenarios, moving beyond reliance on model scale alone for improved pedagogical outcomes. Experiments demonstrate consistent and significant gains over heuristic baselines, general-purpose large models, and specialized Socratic agents across both STEM and non-STEM tasks, achieving full mastery faster and with fewer dialogue turns.
📝 Abstract
Large language models are now widely used for everyday learning, but the underlying interactions are typically unstructured chats rather than following a curriculum. Unlike formal online learning systems, these interactions carry no prior record of the student, so any estimate of what the student already knows must be inferred from the dialogue itself. We show that this gap is not closed by scaling models alone. Frontier and education-tuned LLMs perform poorly when asked to tutor a student over an extended session, because doing so requires three things at once. The tutor must sequence a curriculum, conduct Socratic dialogue, and infer the student's knowledge state from that dialogue. We propose separating these responsibilities. Given a student query, our system constructs a prerequisite knowledge graph in which subtopics are nodes and dependencies are edges, and frames tutoring as deciding which node to teach next and how many dialogue turns to spend on it before moving on. A lightweight PPO policy handles this sequencing decision, while an LLM conducts the Socratic exchange at the chosen node and returns a signal of student progress. Across held-out STEM and non-STEM topics, our PPO-paired tutor outperforms heuristic baselines, frontier general-purpose models, and a model specialised for Socratic dialogue: on both the rate at which students reach full curriculum mastery and the number of turns required. Explicit curriculum structure delivers gains that scaling the underlying model does not.