Audit-of-Understanding: Posterior-Constrained Inference for Mathematical Reasoning in Language Models

📅 2025-10-11

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

Large language models (LLMs) frequently generate coherent yet incorrect mathematical reasoning paths by relying on unverified assumptions—a phenomenon termed “reasoning hallucination.” Existing approaches primarily address factual hallucinations or perform post-hoc verification, lacking mechanisms for verifiable control over the reasoning process itself. Method: We propose the Audit-of-Understanding (AoU) framework, the first to decompose mathematical reasoning into three stages: hypothesis generation, supportive auditing, and conditional reasoning grounded exclusively on verified premises. AoU introduces a posterior-constrained inference mechanism integrating selective prediction and rejection learning, accompanied by theoretical risk bounds and formal verifiability guarantees. Contribution/Results: Evaluated on GSM8K, MultiArith, and SVAMP, AoU significantly outperforms chain-of-thought and self-consistency baselines, achieving 30–45% absolute accuracy gains. It yields more faithful, interpretable, and verifiable reasoning—demonstrating both empirical efficacy and principled controllability.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) often generate reasoning traces that appear coherent but rest on unsupported assumptions, leading to hallucinated conclusions. Prior work mainly addresses factual hallucinations or relies on post-hoc verification, leaving reasoning-induced hallucinations largely unaddressed. We propose Audit-of-Understanding (AoU), a framework that constrains inference to validated premises through three phases: (1) decomposing a query into candidate assumptions, (2) auditing their support, and (3) conditioning inference only on the validated subset. Formally, AoU is emph{posterior-constrained inference}, connecting to selective prediction and rejection learning. Our contributions are threefold: (i) theoretical guarantees under perfect validation, (ii) excess-risk bounds under imperfect audits, and (iii) tractability analysis. Empirically, AoU improves both accuracy and faithfulness on GSM8K, MultiArith, and SVAMP, achieving up to +30% gains on GSM8K, +45% on MultiArith, and consistent +20--28% improvements on SVAMP over Chain-of-Thought, Self-Consistency, and CoT-Decoding. Code is available at https://anonymous.4open.science/r/audit-of-understanding-E28B.

Problem

Research questions and friction points this paper is trying to address.

Addresses reasoning-induced hallucinations in language models

Constrains inference to validated premises for accuracy

Improves mathematical reasoning faithfulness and performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decomposes queries into candidate assumptions for validation

Audits assumption support to filter unsupported premises

Constrains inference to validated premises for reasoning

🔎 Similar Papers

No similar papers found.