MANAR: Memory-augmented Attention with Navigational Abstract Conceptual Representation

📅 2026-03-19

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This work addresses the limitations of standard multi-head attention, which lacks the functional bottleneck and global integration capacity posited by cognitive science and suffers from quadratic computational complexity. To this end, we propose the first formalization of Global Workspace Theory as a novel attention architecture, featuring a trainable memory bank and Abstract Concept Representations (ACRs) integrated through a two-stage assemble-broadcast mechanism. This design enables efficient global information integration with local contextual modulation, achieving linear time complexity, supporting non-convex representation composition, and allowing direct transfer of pre-trained Transformer weights. The proposed model matches or surpasses strong baselines across diverse modalities, attaining 85.1 on GLUE for language, 83.9% top-1 accuracy on ImageNet-1K for vision, and a 2.7% word error rate on LibriSpeech for speech.

Technology Category

Application Category

📝 Abstract

MANAR (Memory-augmented Attention with Navigational Abstract Conceptual Representation), contextualization layer generalizes standard multi-head attention (MHA) by instantiating the principles of Global Workspace Theory (GWT). While MHA enables unconstrained all-to-all communication, it lacks the functional bottleneck and global integration mechanisms hypothesized in cognitive models of consciousness. MANAR addresses this by implementing a central workspace through a trainable memory of abstract concepts and an Abstract Conceptual Representation (ACR). The architecture follows a two-stage logic that maps directly to GWT mechanics: (i) an integration phase, where retrieved memory concepts converge to form a collective "mental image" (the ACR) based on input stimuli; and (ii) a broadcasting phase, where this global state navigates and informs the contextualization of individual local tokens. We demonstrate that efficient linear-time scaling is a fundamental architectural byproduct of instantiating GWT functional bottleneck, as routing global information through a constant-sized ACR resolves the quadratic complexity inherent in standard attention. MANAR is a compatible re-parameterization of MHA with identical semantic roles for its projections, enabling knowledge transfer from pretrained transformers via weight-copy and thus overcoming the adoption barriers of structurally incompatible linear-time alternatives. MANAR enables non-convex contextualization, synthesizing representations that provably lie outside the convex hull of input tokens - a mathematical reflection of the creative synthesis described in GWT. Empirical evaluations confirm that MANAR matches or exceeds strong baselines across language (GLUE score of 85.1), vision (83.9% ImageNet-1K), and speech (2.7% WER on LibriSpeech), positioning it as an efficient and expressive alternative to quadratic attention.

Problem

Research questions and friction points this paper is trying to address.

multi-head attention

Global Workspace Theory

functional bottleneck

global integration

quadratic complexity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Memory-augmented Attention

Global Workspace Theory

Abstract Conceptual Representation