🤖 AI Summary
This work addresses the limitations of existing Vision Mamba–based RGB-event tracking methods, which employ static state transition matrices and struggle to adapt to varying sparsity in event streams, often resulting in underfitting or overfitting and thereby compromising the robustness of cross-modal fusion. To overcome this, the authors propose MambaTrack, a novel framework featuring a dynamic state space model equipped with an event-density–driven mechanism that modulates state transition rates and an adaptive gating projection fusion module that leverages both event density and RGB confidence. This design enables efficient and flexible multimodal collaborative modeling. Evaluated on the FE108 and FELT datasets, MambaTrack achieves state-of-the-art performance while maintaining a lightweight architecture suitable for real-time embedded deployment.
📝 Abstract
Existing Vision Mamba-based RGB-Event(RGBE) tracking methods suffer from using static state transition matrices, which fail to adapt to variations in event sparsity. This rigidity leads to imbalanced modeling-underfitting sparse event streams and overfitting dense ones-thus degrading cross-modal fusion robustness. To address these limitations, we propose MambaTrack, a multimodal and efficient tracking framework built upon a Dynamic State Space Model(DSSM). Our contributions are twofold. First, we introduce an event-adaptive state transition mechanism that dynamically modulates the state transition matrix based on event stream density. A learnable scalar governs the state evolution rate, enabling differentiated modeling of sparse and dense event flows. Second, we develop a Gated Projection Fusion(GPF) module for robust cross-modal integration. This module projects RGB features into the event feature space and generates adaptive gates from event density and RGB confidence scores. These gates precisely control the fusion intensity, suppressing noise while preserving complementary information. Experiments show that MambaTrack achieves state-of-the-art performance on the FE108 and FELT datasets. Its lightweight design suggests potential for real-time embedded deployment.