Temporal Coding as a Substrate for Sensorimotor Object Inference: A Spiking Reinterpretation of Thousand Brains Architecture

📅 2026-05-21

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This work addresses the limitations of existing dense vector approaches, which neglect the temporal sequencing and spatial relational structure of tactile contact features, thereby struggling to effectively recognize moving objects. The authors propose a novel spiking neural network–based method that replaces dense vectors with spike packets ordered by activation intensity. Temporal intervals between spikes implicitly encode sensor displacement, while spike-timing-dependent plasticity (STDP) rules embed traversal direction into synaptic weights. A learnable parameter λ adaptively modulates the reliance on historical versus current tactile inputs, reflecting an object’s geometric complexity. Notably, this approach integrates temporal coding into the Thousand Brains architecture for the first time, eliminating the need for explicit coordinate computation and aligning more closely with biological mechanisms. Experiments demonstrate 100% recognition accuracy on objects with identical features but distinct spatial arrangements—where dense methods perform near chance—and consistent superiority by 30–50 percentage points under various noise conditions, with λ converging to values that correlate with geometric complexity.

📝 Abstract

The Thousand Brains Theory (TBT) and its open-source Monty framework model object recognition through sensorimotor inference -- identifying objects by actively moving a sensor across their surface and building evidence contact by contact. The current implementation encodes each contact as a dense floating-point vector. While Monty tracks inter-step displacement and accumulates evidence across contacts, it treats the feature activation pattern at each contact as an unordered set - the directional sequence in which features are encountered carries no representational weight. In TBT, the sequence of contacts carries spatial meaning: knowing that feature A was felt before feature B during a left-to-right sweep tells you something about where A and B sit on the object. Dense vectors discard this ordering. We propose replacing dense vectors with rank-order spike packets: each contact produces a brief burst of neural events where the most strongly activated neuron fires first. The time gap between successive bursts implicitly encodes sensor displacement without explicit coordinate calculations. A biologically motivated learning rule (STDP) encodes traversal direction into synaptic weights. A learnable parameter lambda adjusts reliance on earlier versus recent contacts, adapting to each object's geometry. We derive three testable predictions and specify an implementation of four components in approximately 450 lines of NumPy. Three synthetic experiments confirm the core claims: temporal coding achieves perfect discrimination accuracy on objects with identical features in different spatial arrangements, where dense accumulation performs at chance; temporal coding maintains a 30-50 percentage point advantage across all tested noise levels; the adaptive lambda converges to distinct values, reflecting object geometric complexity. End-to-end evaluation on Monty's YCB benchmark is left for future work.

Problem

Research questions and friction points this paper is trying to address.

Temporal Coding

Sensorimotor Inference

Object Recognition

Spike Timing

Feature Sequence

Innovation

Methods, ideas, or system contributions that make the work stand out.

temporal coding

spiking neural networks

sensorimotor inference