Hierarchical Apprenticeship Learning from Imperfect Demonstrations with Evolving Rewards

📅 2026-03-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional apprenticeship learning relies on optimal demonstrations and fixed reward structures, which struggle to accommodate the imperfect behaviors of learners and the dynamically evolving nature of educational goals in real-world teaching scenarios. To address this limitation, this work proposes HALIDE, a novel framework that uniquely treats suboptimal student demonstrations as structured signals. By integrating hierarchical reinforcement learning with multi-granularity behavioral abstraction, HALIDE jointly models demonstration quality rankings and time-varying rewards across multiple levels to infer high-level learning intentions. This approach effectively distinguishes transient errors, suboptimal strategies, and meaningful progress. Empirical results demonstrate that HALIDE significantly outperforms existing methods in accurately predicting pedagogical decisions.
📝 Abstract
While apprenticeship learning has shown promise for inducing effective pedagogical policies directly from student interactions in e-learning environments, most existing approaches rely on optimal or near-optimal expert demonstrations under a fixed reward. Real-world student interactions, however, are often inherently imperfect and evolving: students explore, make errors, revise strategies, and refine their goals as understanding develops. In this work, we argue that imperfect student demonstrations are not noise to be discarded, but structured signals-provided their relative quality is ranked. We introduce HALIDE, Hierarchical Apprenticeship Learning from Imperfect Demonstrations with Evolving Rewards, which not only leverages sub-optimal student demonstrations, but ranks them within a hierarchical learning framework. HALIDE models student behavior at multiple levels of abstraction, enabling inference of higher-level intent and strategy from suboptimal actions while explicitly capturing the temporal evolution of student reward functions. By integrating demonstration quality into hierarchical reward inference,HALIDE distinguishes transient errors from suboptimal strategies and meaningful progress toward higher-level learning goals. Our results show that HALIDE more accurately predicts student pedagogical decisions than approaches that rely on optimal trajectories, fixed rewards, or unranked imperfect demonstrations.
Problem

Research questions and friction points this paper is trying to address.

Apprenticeship Learning
Imperfect Demonstrations
Evolving Rewards
Hierarchical Learning
Student Modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

hierarchical apprenticeship learning
imperfect demonstrations
evolving rewards
demonstration ranking
pedagogical policy inference
M
Md Mirajul Islam
North Carolina State University, Raleigh, NC 27606, USA
R
Rajesh Debnath
North Carolina State University, Raleigh, NC 27606, USA
A
Adittya Soukarjya Saha
North Carolina State University, Raleigh, NC 27606, USA
Min Chi
Min Chi
Department of Computer Science, North Carolina State University
Artificial Intelligence and Machine Learning in real lifeReinforcement LearningEducational TechnologyHealthcare