🤖 AI Summary
Conventional attention mechanisms in continuous-time (CT) modeling suffer from both discretization artifacts and biological implausibility. Method: We propose the Neuronal Attention Circuit (NAC), a biologically interpretable CT-attention framework that models attention logits as solutions to first-order ordinary differential equations (ODEs) with nonlinear gated coupling, integrated with the neuroanatomically inspired Neural Circuit Policy (NCP) wiring scheme. NAC incorporates sparse sensory gating and a dual-head sparse backbone, supporting three interchangeable logits computation modes: explicit Euler integration, closed-form analytical solution, and steady-state approximation. Contribution/Results: Theoretical analysis establishes state stability and universal approximation capability. Experiments demonstrate that NAC matches or surpasses state-of-the-art baselines in irregular time-series classification, autonomous driving lane-keeping, and industrial remaining useful life prediction—achieving competitive accuracy while maintaining moderate runtime and memory overhead among typical CT models, alongside rigorous error-bound guarantees.
📝 Abstract
Attention improves representation learning over RNNs, but its discrete nature limits continuous-time (CT) modeling. We introduce Neuronal Attention Circuit (NAC), a novel, biologically plausible CT-Attention mechanism that reformulates attention logits computation as the solution to a linear first-order ODE with nonlinear interlinked gates derived from repurposing extit{C. elegans} Neuronal Circuit Policies (NCPs) wiring mechanism. NAC replaces dense projections with sparse sensory gates for key-query projections and a sparse backbone network with two heads for computing extit{content-target} and extit{learnable time-constant} gates, enabling efficient adaptive dynamics. NAC supports three attention logit computation modes: (i) explicit Euler integration, (ii) exact closed-form solution, and (iii) steady-state approximation. To improve memory intensity, we implemented a sparse Top-emph{K} pairwise concatenation scheme that selectively curates key-query interactions. We provide rigorous theoretical guarantees, including state stability, bounded approximation errors, and universal approximation. Empirically, we implemented NAC in diverse domains, including irregular time-series classification, lane-keeping for autonomous vehicles, and industrial prognostics. We observed that NAC matches or outperforms competing baselines in accuracy and occupies an intermediate position in runtime and memory efficiency compared with several CT baselines.