Error as a Lens: Probing LLM Reasoning through Synthetic Misconception Generation

📅 2026-05-27
📈 Citations: 0
Influential: 0
📄 PDF

career value

139K/year
🤖 AI Summary
This work addresses the scarcity of authentic student error corpora, which hinders the development of synthetic misconception generation tailored to specific cognitive errors. To overcome this limitation, the authors propose a dual-agent collaborative framework: a generator agent produces erroneous solutions based on five cognitive error types derived from a revised Bloom’s taxonomy, while a reviewer agent validates both the incorrectness and category consistency of these responses. The framework further incorporates answer anchoring to enhance generation quality. This approach enables, for the first time, targeted synthesis of errors aligned with predefined cognitive error types, yielding a reusable, hierarchically structured synthetic error dataset. Experiments on TheoremQA demonstrate that the framework efficiently generates high-quality, category-consistent error samples, with answer anchoring proving most critical for performance gains, thereby validating the feasibility and potential of directed error generation.
📝 Abstract
Personalized tutoring, teacher training, and education research need access to \emph{targeted} synthetic misconceptions, but privacy and IRB constraints make labelled corpora of real student errors scarce. LLMs could in principle generate synthetic errors at scale, but producing an arbitrary wrong answer is easy for a modern LLM while producing one that matches a specified cognitive failure mode is much harder. We present a framework that generates errors targeted to a five-class taxonomy adapted from the revised Bloom's taxonomy, evaluated on questions from the TheoremQA dataset. A Generation Agent (GA) drafts a candidate erroneous solution conditioned on a target class, and an Examination Agent (EA) judges whether the draft is incorrect and class-consistent. The framework yields a reusable recipe for building class-stratified synthetic error datasets where authentic student corpora are unavailable. As a secondary diagnostic, targeted error generation is substantially harder than free-form incorrect-answer generation, and answer-grounding contributes more than expanded examples or external textbook content.
Problem

Research questions and friction points this paper is trying to address.

synthetic misconceptions
cognitive failure mode
targeted error generation
Bloom's taxonomy
LLM reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

synthetic misconception generation
targeted error generation
cognitive failure modes
LLM reasoning probing
agent-based framework
🔎 Similar Papers
2024-10-03International Conference on Learning RepresentationsCitations: 28