🤖 AI Summary
This study addresses the challenge that existing large language model (LLM)-based intelligent tutors struggle to accommodate the diverse cognitive and communicative needs of learners with disabilities in special education. To bridge this gap, the authors propose Special-R1, a novel framework that integrates a two-dimensional adaptive prompting system—grounded in learner disability profiles—with a role-aware chain-of-thought reward mechanism, enabling dynamic adjustment of teaching style and support intensity through reinforcement learning. Evaluated across 690 multi-turn dialogues, the approach significantly improves role alignment (+1.65 points), achieves a SPED helpfulness score of 0.768, and outperforms baseline models by 0.064 in overall performance while maintaining consistent cross-domain robustness, thereby filling a critical void in LLM alignment for special education.
📝 Abstract
Large language models are increasingly deployed as intelligent tutors, yet research on aligning them for special education remains absent. Recent work has applied reinforcement learning to LLM tutors, but these methods target a generic learner in a single domain (mathematics) and do not address the cognitive and communicative diversity of learners with disabilities. We introduce \emph{Special-R1}, a framework that extends pedagogical RL to special education through two components: (1) a two-dimensional adaptive system prompt that couples a difficulty-based support level with a disability-specific teaching style across five disability profiles; and (2) a persona-aware Thinking Reward whose judge rubric is conditioned on the learner's disability profile. On a persona-augmented test set of 690 multi-turn dialogues, our full model raises persona-aware Fit from 6.75 (generic baseline) to 8.40 (+1.65) and SPED-rubric Helpfulness from 0.720 to 0.768, leading on the four-component Total (2.911, +0.064 over the runner-up) while remaining within 0.01 of the strongest variant on the out-of-domain OpenLearnLM benchmark (8.53). Ablations show that the Thinking Reward becomes effective only in combination with adaptive prompting, and that residual weakness on specific learning disability in mathematics motivates targeted multimodal extensions.