EDIT: Evidence-Diagnosed Intervention Training for Rule-Faithful LLM Grading

📅 2026-06-04

📈 Citations: 0

✨ Influential: 0

career value

148K/year

🤖 AI Summary

This work addresses the limitations of current large language models in complex, multidisciplinary scoring tasks, where they often fail to rigorously apply rubrics or ground judgments in specific evidence from student responses, and lack the ability to diagnose erroneous reasoning steps or harmful belief shifts. To overcome these challenges, the authors propose EDIT, a two-stage training framework: first, it leverages internal model states—such as posterior beliefs about final scores and input-grounded scoring signals—to pinpoint flawed reasoning and locally refine outputs according to detailed rubrics; second, it employs belief-guided reward shaping in reinforcement learning fine-tuning to suppress detrimental belief deviations. EDIT introduces the first internal-state-based diagnostic mechanism for precise intervention in scoring reasoning and designs a belief-oriented reward function to balance exploration with rule adherence. Experiments on two real-world multidisciplinary scoring benchmarks demonstrate that EDIT significantly outperforms strong baselines both in-domain and out-of-domain, with ablation studies confirming that internal-state diagnosis is key to its performance gains.

📝 Abstract

Reliable rubric grading requires more than accurate score prediction. Each judgement must be grounded in the mark scheme and evidence from the student answer. Existing credit-assignment and intervention methods, primarily designed for self-contained reasoning tasks such as mathematics reasoning, struggle in this setting because they do not identify where grading reasoning goes wrong or how the model's belief about the final mark changes during reasoning. We propose Evidence-Diagnosed Intervention Training (EDIT), a two-phase framework for training more rubric-faithful LLM graders. First, EDIT-SFT locates problematic reasoning steps using internal model signals: posterior belief over the final mark and input-grounding scores. It then revises only these local steps with help from a rubric checklist. Second, EDIT-RL calibrates the grader with belief-guided reward shaping, penalising large harmful belief drifts while still allowing helpful exploration. Experiments on two real-world, multi-subject grading benchmarks demonstrate that EDIT consistently outperforms strong supervised fine-tuning and reinforcement learning baselines on both in-domain and out-of-domain splits, with ablation studies confirming that internal-state diagnostics drive these gains.

Problem

Research questions and friction points this paper is trying to address.

rubric-faithful grading

evidence grounding

credit assignment

belief drift

LLM grading

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evidence-Diagnosed Intervention Training

rubric-faithful grading

belief-guided reward shaping