Error as a Lens: Probing LLM Reasoning through Synthetic Misconception Generation

📅 2026-05-27

📈 Citations: 0

✨ Influential: 0

career value

139K/year

🤖 AI Summary

This work addresses the scarcity of authentic student error corpora, which hinders the development of synthetic misconception generation tailored to specific cognitive errors. To overcome this limitation, the authors propose a dual-agent collaborative framework: a generator agent produces erroneous solutions based on five cognitive error types derived from a revised Bloom’s taxonomy, while a reviewer agent validates both the incorrectness and category consistency of these responses. The framework further incorporates answer anchoring to enhance generation quality. This approach enables, for the first time, targeted synthesis of errors aligned with predefined cognitive error types, yielding a reusable, hierarchically structured synthetic error dataset. Experiments on TheoremQA demonstrate that the framework efficiently generates high-quality, category-consistent error samples, with answer anchoring proving most critical for performance gains, thereby validating the feasibility and potential of directed error generation.

📝 Abstract

Personalized tutoring, teacher training, and education research need access to \emph{targeted} synthetic misconceptions, but privacy and IRB constraints make labelled corpora of real student errors scarce. LLMs could in principle generate synthetic errors at scale, but producing an arbitrary wrong answer is easy for a modern LLM while producing one that matches a specified cognitive failure mode is much harder. We present a framework that generates errors targeted to a five-class taxonomy adapted from the revised Bloom's taxonomy, evaluated on questions from the TheoremQA dataset. A Generation Agent (GA) drafts a candidate erroneous solution conditioned on a target class, and an Examination Agent (EA) judges whether the draft is incorrect and class-consistent. The framework yields a reusable recipe for building class-stratified synthetic error datasets where authentic student corpora are unavailable. As a secondary diagnostic, targeted error generation is substantially harder than free-form incorrect-answer generation, and answer-grounding contributes more than expanded examples or external textbook content.

Problem

Research questions and friction points this paper is trying to address.

synthetic misconceptions

cognitive failure mode

targeted error generation

Bloom's taxonomy

LLM reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

synthetic misconception generation

targeted error generation

cognitive failure modes