π€ AI Summary
This work addresses the limitations of existing global workspace architectures, which lack effective attention mechanisms and struggle to balance noise robustness with cross-task generalization in multimodal fusion. Inspired by cognitive neuroscience, we propose the first explicit modality selection attention mechanism tailored for global workspace frameworks. Our approach employs a learnable, top-down attention process to dynamically integrate and select relevant modality-specific information. Evaluated on the Simple Shapes and MM-IMDb 1.0 datasets, the proposed method significantly enhances robustness to input noise while achieving performance on par with state-of-the-art approaches on MM-IMDb 1.0. These results demonstrate its superior capacity for cross-task and cross-modal generalization, highlighting the efficacy of biologically inspired attention in multimodal reasoning systems.
π Abstract
Global Workspace Theory (GWT), inspired by cognitive neuroscience, posits that flexible cognition could arise via the attentional selection of a relevant subset of modalities within a multimodal integration system. This cognitive framework can inspire novel computational architectures for multimodal integration. Indeed, recent implementations of GWT have explored its multimodal representation capabilities, but the related attention mechanisms remain understudied. Here, we propose and evaluate a top-down attention mechanism to select modalities inside a global workspace. First, we demonstrate that our attention mechanism improves noise robustness of a global workspace system on two multimodal datasets of increasing complexity: Simple Shapes and MM-IMDb 1.0. Second, we highlight various cross-task and cross-modality generalization capabilities that are not shared by multimodal attention models from the literature. Comparing against existing baselines on the MM-IMDb 1.0 benchmark, we find our attention mechanism makes the global workspace competitive with the state of the art.