Training Turn-by-Turn Verifiers for Dialogue Tutoring Agents: The Curious Case of LLMs as Your Coding Tutors

📅 2025-02-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitation of large language models (LLMs) in delivering proactive, personalized tutoring for complex real-world tasks—particularly programming education. To this end, we propose a dialogue-based tutoring agent framework tailored for coding tasks. Methodologically, we introduce the novel Trace-and-Verify (TRAVER) workflow, which jointly integrates knowledge tracing with round-by-round dynamic verification, and design the DICT automated evaluation protocol—enabling controllable student simulation, code generation, and execution-based testing to close the tutoring loop. Our contributions are threefold: (1) the first LLM-driven, proactive, guidance-oriented coding tutor; (2) an interpretable paradigm unifying fine-grained knowledge-state tracking with iterative verification; and (3) empirical results demonstrating significant improvement in task completion rates, alongside a systematic diagnosis of critical LLM bottlenecks in complex tutoring—such as planning consistency and feedback adaptability—and actionable optimization pathways.

Technology Category

Application Category

📝 Abstract
Intelligent tutoring agents powered by large language models (LLMs) have been increasingly explored to deliver personalized guidance in areas such as language learning and science education. However, their capabilities in guiding users to solve complex real-world tasks remain underexplored. To address this limitation, in this work, we focus on coding tutoring, a challenging problem that requires tutors to proactively guide students toward completing predefined coding tasks. We propose a novel agent workflow, Trace-and-Verify (TRAVER), which combines knowledge tracing to estimate a student's knowledge state and turn-by-turn verification to ensure effective guidance toward task completion. We introduce DICT, an automatic evaluation protocol that assesses tutor agents holistically using controlled student simulation and code generation tests. Extensive experiments reveal the challenges of coding tutoring and demonstrate that TRAVER achieves a significantly higher success rate. Although we use code tutoring as an example in this paper, our results and findings can be extended beyond coding, providing valuable insights into advancing tutoring agents for a variety of tasks.
Problem

Research questions and friction points this paper is trying to address.

Enhancing LLM-based tutoring for complex tasks
Developing TRAVER for coding tutoring efficiency
Introducing DICT for holistic tutor evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs for coding tutoring
Trace-and-Verify workflow
Automatic evaluation protocol DICT
🔎 Similar Papers
No similar papers found.
J
Jian Wang
The Hong Kong Polytechnic University, University of Michigan
Yinpei Dai
Yinpei Dai
Tsinghua, Alibaba, UMich
Embodied AIRoboticsDialogue System
Y
Yichi Zhang
University of Michigan
Ziqiao Ma
Ziqiao Ma
University of Michigan
Machine LearningComputational Linguistics
W
Wenjie Li
The Hong Kong Polytechnic University
J
Joyce Chai
University of Michigan