🤖 AI Summary
This work addresses the poor generalization of deep learning models on symbolic sequence reasoning tasks (e.g., arithmetic addition). We propose a modular neural architecture inspired by the Global Workspace Theory (GWT), comprising dedicated functional modules—input, incremental, and output—orchestrated by an LSTM-based controller that dynamically routes information to form interpretable, serialized operation chains, emulating System-2–style reasoning. Crucially, we introduce the first end-to-end differentiable implementation of GWT for learnable operation orchestration, supporting both handcrafted (e.g., one-hot, MNIST) and learned representations. Experiments demonstrate that our model achieves superior interpolation and extrapolation performance on addition tasks with significantly fewer parameters than LSTMs and Transformers, while exhibiting strong generalization to unseen numerical combinations. These results validate that modular, controller-guided routing substantially enhances symbolic reasoning capability.
📝 Abstract
We present a model inspired by the Global Workspace Theory that integrates specialized modules to perform a sequential reasoning task. A controller selectively routes information between modules through the workspace using a gating mechanism. This approach allows the model to chain operations by iteratively broadcasting information between specialized domains, mimicking System-2 reasoning. We evaluate the model's performance on a simple addition task, where two addends must be summed. The task can be solved by routing information sequentially through an Input module, an Increment module (multiple times), and finally an Output module. We consider two implementations of this system with increasing complexity. First, using hand-designed modules operating on one-hot digit representations, the controller (a LSTM recurrent network) learns to select the appropriate modules (input, increment, output) in the appropriate sequence. Second, we replace the hand-designed modules with learned representation modules for MNIST images and an increment module trained on the task objectives; here again, the controller learns the appropriate sequential module selection to solve the task. Finally, we show that the Global Workspace model, while having fewer parameters, outperforms LSTMs and Transformers when tested on unseen addition operations (both interpolations and extrapolations of addition operations seen during training). Our results highlight the potential of architectures inspired by the Global Workspace Theory to enhance deep learning's reasoning capabilities.