InceptionMamba: An Efficient Hybrid Network with Large Band Convolution and Bottleneck Mamba

📅 2025-06-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
InceptionNeXt exhibits strong performance but suffers from limitations inherent to its 1D stripe convolutions: inadequate modeling of multidimensional spatial dependencies, insufficient local neighborhood representation, and weak global contextual modeling due to convolutional locality. To address these issues, we propose InceptionMamba, a novel hybrid backbone architecture. It introduces orthogonal wide-band convolutions—replacing conventional stripe convolutions—to enable efficient, cooperative spatial modeling across dimensions. Additionally, we design a bottlenecked Mamba module that enhances cross-channel feature fusion and expands the effective receptive field, thereby improving semantic understanding while preserving fine-grained local details. Extensive experiments demonstrate that InceptionMamba achieves state-of-the-art performance on image classification and multiple downstream vision tasks, with significant gains in both parameter efficiency and computational efficiency.

Technology Category

Application Category

📝 Abstract
Within the family of convolutional neural networks, InceptionNeXt has shown excellent competitiveness in image classification and a number of downstream tasks. Built on parallel one-dimensional strip convolutions, however, it suffers from limited ability of capturing spatial dependencies along different dimensions and fails to fully explore spatial modeling in local neighborhood. Besides, inherent locality constraints of convolution operations are detrimental to effective global context modeling. To overcome these limitations, we propose a novel backbone architecture termed InceptionMamba in this study. More specifically, the traditional one-dimensional strip convolutions are replaced by orthogonal band convolutions in our InceptionMamba to achieve cohesive spatial modeling. Furthermore, global contextual modeling can be achieved via a bottleneck Mamba module, facilitating enhanced cross-channel information fusion and enlarged receptive field. Extensive evaluations on classification and various downstream tasks demonstrate that the proposed InceptionMamba achieves state-of-the-art performance with superior parameter and computational efficiency. The source code will be available at https://github.com/Wake1021/InceptionMamba.
Problem

Research questions and friction points this paper is trying to address.

Limited spatial dependency capture in InceptionNeXt
Ineffective global context modeling due to locality
Need cohesive spatial and cross-channel fusion enhancement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Replaces strip convolutions with band convolutions
Introduces bottleneck Mamba for global context
Enhances cross-channel fusion and receptive field
Y
Yuhang Wang
School of Computer and Electronic Information, Nanjing Normal University, China
J
Jun Li
School of Computer and Electronic Information, Nanjing Normal University, China
Z
Zhijian Wu
School of Data Science and Engineering, East China Normal University, China
Jianhua Xu
Jianhua Xu
University of Electronic Science and Technology of China
Multi-Agent、Evolutionary Games、LLM-Agents