Reasoning is a Modality

📅 2026-01-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of interpretable internal reasoning mechanisms in current AI systems when tackling visual abstract reasoning tasks such as the Abstraction and Reasoning Corpus (ARC), where models often rely solely on behavioral statistical matching rather than human-like rule induction. To overcome this limitation, the paper introduces a novel paradigm—“reasoning as a modality”—which explicitly decouples a global controller from a grid-based workspace at the architectural level. This is achieved through role-separated Transformer modules and an iterative rule execution mechanism, enabling controller-driven, interpretable reasoning. Evaluated under the VARC (Visual Abstract Reasoning with Centralized protocol) framework, the proposed model achieves a 62.6% accuracy on the ARC-1 benchmark, surpassing both the human average performance (60.2%) and all existing methods, while demonstrating more coherent and structurally consistent rule application.

Technology Category

Application Category

📝 Abstract
The Abstraction and Reasoning Corpus (ARC) provides a compact laboratory for studying abstract reasoning, an ability central to human intelligence. Modern AI systems, including LLMs and ViTs, largely operate as sequence-of-behavior prediction machines: they match observable behaviors by modeling token statistics without a persistent, readable mental state. This creates a gap with human-like behavior: humans can explain an action by decoding internal state, while AI systems can produce fluent post-hoc rationalizations that are not grounded in such a state. We hypothesize that reasoning is a modality: reasoning should exist as a distinct channel separate from the low-level workspace on which rules are applied. To test this hypothesis, on solving ARC tasks as a visual reasoning problem, we designed a novel role-separated transformer block that splits global controller tokens from grid workspace tokens, enabling iterative rule execution. Trained and evaluated within the VARC vision-centric protocol, our method achieved 62.6% accuracy on ARC-1, surpassing average human performance (60.2%) and outperforming prior methods significantly. Qualitatively, our models exhibit more coherent rule-application structure than the dense ViT baseline, consistent with a shift away from plausible probability blobs toward controller-driven reasoning.
Problem

Research questions and friction points this paper is trying to address.

abstract reasoning
mental state
reasoning modality
ARC
explainability
Innovation

Methods, ideas, or system contributions that make the work stand out.

reasoning as modality
role-separated transformer
Abstraction and Reasoning Corpus
controller-driven reasoning
visual reasoning
🔎 Similar Papers
No similar papers found.
Z
Zhiguang Liu
University of Missouri - Columbia
Yi Shang
Yi Shang
Professor, EECS Dept, University of Missouri