Reasoning is a Modality

📅 2026-01-20

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This work addresses the lack of interpretable internal reasoning mechanisms in current AI systems when tackling visual abstract reasoning tasks such as the Abstraction and Reasoning Corpus (ARC), where models often rely solely on behavioral statistical matching rather than human-like rule induction. To overcome this limitation, the paper introduces a novel paradigm—“reasoning as a modality”—which explicitly decouples a global controller from a grid-based workspace at the architectural level. This is achieved through role-separated Transformer modules and an iterative rule execution mechanism, enabling controller-driven, interpretable reasoning. Evaluated under the VARC (Visual Abstract Reasoning with Centralized protocol) framework, the proposed model achieves a 62.6% accuracy on the ARC-1 benchmark, surpassing both the human average performance (60.2%) and all existing methods, while demonstrating more coherent and structurally consistent rule application.

Technology Category

Application Category

📝 Abstract

The Abstraction and Reasoning Corpus (ARC) provides a compact laboratory for studying abstract reasoning, an ability central to human intelligence. Modern AI systems, including LLMs and ViTs, largely operate as sequence-of-behavior prediction machines: they match observable behaviors by modeling token statistics without a persistent, readable mental state. This creates a gap with human-like behavior: humans can explain an action by decoding internal state, while AI systems can produce fluent post-hoc rationalizations that are not grounded in such a state. We hypothesize that reasoning is a modality: reasoning should exist as a distinct channel separate from the low-level workspace on which rules are applied. To test this hypothesis, on solving ARC tasks as a visual reasoning problem, we designed a novel role-separated transformer block that splits global controller tokens from grid workspace tokens, enabling iterative rule execution. Trained and evaluated within the VARC vision-centric protocol, our method achieved 62.6% accuracy on ARC-1, surpassing average human performance (60.2%) and outperforming prior methods significantly. Qualitatively, our models exhibit more coherent rule-application structure than the dense ViT baseline, consistent with a shift away from plausible probability blobs toward controller-driven reasoning.

Problem

Research questions and friction points this paper is trying to address.

abstract reasoning

mental state

reasoning modality

ARC

explainability

Innovation

Methods, ideas, or system contributions that make the work stand out.

reasoning as modality

role-separated transformer

Abstraction and Reasoning Corpus

controller-driven reasoning

visual reasoning

🔎 Similar Papers

No similar papers found.