🤖 AI Summary
This work addresses the limitations of existing RGB-only trackers in complex dynamic scenes and the insufficient exploitation of the high-frequency and temporal responsiveness of event data in current RGB-event fusion approaches. To overcome these challenges, the authors propose a frequency-aware RGB-event object tracking framework that establishes complementary cross-modal associations through frequency-domain modeling. The core innovations include a spectral-enhanced Transformer layer featuring multi-head dynamic Fourier filtering for explicit frequency-domain feature fusion, and a learnable wavelet transform-driven edge refinement module to capture multi-scale edge structures from event data. Evaluated on the COESOT and FE108 datasets, the proposed method achieves state-of-the-art performance with a baseline accuracy of 76.6%, demonstrating the effectiveness of frequency-domain modeling in multimodal tracking.
📝 Abstract
Existing single-modal RGB trackers often face performance bottlenecks in complex dynamic scenes, while the introduction of event sensors offers new potential for enhancing tracking capabilities. However, most current RGB-event fusion methods, primarily designed in the spatial domain using convolutional, Transformer, or Mamba architectures, fail to fully exploit the unique temporal response and high-frequency characteristics of event data. To address this, we1 propose FreqTrack, a frequency-aware RGBE tracking framework that establishes complementary inter-modal correlations through frequency-domain transformations for more robust feature fusion. We design a Spectral Enhancement Transformer (SET) layer that incorporates multi-head dynamic Fourier filtering to adaptively enhance and select frequency-domain features. Additionally, we develop a Wavelet Edge Refinement (WER) module, which leverages learnable wavelet transforms to explicitly extract multi-scale edge structures from event data, effectively improving modeling capability in high-speed and low-light scenarios. Extensive experiments on the COESOT and FE108 datasets demonstrate that FreqTrack achieves highly competitive performance, particularly attaining leading precision of 76.6\% on the COESOT benchmark, validating the effectiveness of frequency-domain modeling for RGBE tracking.