🤖 AI Summary
InceptionNeXt exhibits strong performance but suffers from limitations inherent to its 1D stripe convolutions: inadequate modeling of multidimensional spatial dependencies, insufficient local neighborhood representation, and weak global contextual modeling due to convolutional locality. To address these issues, we propose InceptionMamba, a novel hybrid backbone architecture. It introduces orthogonal wide-band convolutions—replacing conventional stripe convolutions—to enable efficient, cooperative spatial modeling across dimensions. Additionally, we design a bottlenecked Mamba module that enhances cross-channel feature fusion and expands the effective receptive field, thereby improving semantic understanding while preserving fine-grained local details. Extensive experiments demonstrate that InceptionMamba achieves state-of-the-art performance on image classification and multiple downstream vision tasks, with significant gains in both parameter efficiency and computational efficiency.
📝 Abstract
Within the family of convolutional neural networks, InceptionNeXt has shown excellent competitiveness in image classification and a number of downstream tasks. Built on parallel one-dimensional strip convolutions, however, it suffers from limited ability of capturing spatial dependencies along different dimensions and fails to fully explore spatial modeling in local neighborhood. Besides, inherent locality constraints of convolution operations are detrimental to effective global context modeling. To overcome these limitations, we propose a novel backbone architecture termed InceptionMamba in this study. More specifically, the traditional one-dimensional strip convolutions are replaced by orthogonal band convolutions in our InceptionMamba to achieve cohesive spatial modeling. Furthermore, global contextual modeling can be achieved via a bottleneck Mamba module, facilitating enhanced cross-channel information fusion and enlarged receptive field. Extensive evaluations on classification and various downstream tasks demonstrate that the proposed InceptionMamba achieves state-of-the-art performance with superior parameter and computational efficiency. The source code will be available at https://github.com/Wake1021/InceptionMamba.