ConceptKT: A Benchmark for Concept-Level Deficiency Prediction in Knowledge Tracing

📅 2026-03-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitation of traditional knowledge tracing, which focuses solely on predicting answer correctness and lacks fine-grained diagnosis of students’ conceptual misunderstandings. To this end, the paper introduces a novel task—concept-level misconception prediction—and presents ConceptKT, a benchmark dataset annotated with both the concepts required to solve each problem and the specific missing concepts underlying incorrect responses. Methodologically, the authors integrate large language models (LLMs) with large reasoning models (LRMs) for in-context learning and propose a history selection strategy grounded in concept alignment and semantic similarity to accurately identify the precise concepts a student is likely to struggle with in the future. Experimental results demonstrate that this approach significantly improves both answer correctness prediction and conceptual misconception identification, offering a new paradigm for fine-grained learning diagnosis.

Technology Category

Application Category

📝 Abstract
Knowledge Tracing (KT) is a critical technique for modeling student knowledge to support personalized learning. However, most KT systems focus on binary correctness prediction and cannot diagnose the underlying conceptual misunderstandings that lead to errors. Such fine-grained diagnostic feedback is essential for designing targeted instruction and effective remediation. In this work, we introduce the task of concept-level deficiency prediction, which extends traditional KT by identifying the specific concepts a student is likely to struggle with on future problems. We present ConceptKT, a dataset annotated with labels that capture both the concepts required to solve each question and the missing concepts underlying incorrect responses. We investigate in-context learning approaches to KT and evaluate the diagnostic capabilities of various Large Language Models (LLMs) and Large Reasoning Models (LRMs). Different strategies for selecting informative historical records are explored. Experimental results demonstrate that selecting response histories based on conceptual alignment and semantic similarity leads to improved performance on both correctness prediction and concept-level deficiency identification.
Problem

Research questions and friction points this paper is trying to address.

Knowledge Tracing
Concept-level Deficiency Prediction
Diagnostic Feedback
Conceptual Misunderstanding
Personalized Learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

concept-level deficiency prediction
Knowledge Tracing
Large Language Models
diagnostic feedback
in-context learning