InceptionMamba: An Efficient Hybrid Network with Large Band Convolution and Bottleneck Mamba

📅 2025-06-10

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

InceptionNeXt exhibits strong performance but suffers from limitations inherent to its 1D stripe convolutions: inadequate modeling of multidimensional spatial dependencies, insufficient local neighborhood representation, and weak global contextual modeling due to convolutional locality. To address these issues, we propose InceptionMamba, a novel hybrid backbone architecture. It introduces orthogonal wide-band convolutions—replacing conventional stripe convolutions—to enable efficient, cooperative spatial modeling across dimensions. Additionally, we design a bottlenecked Mamba module that enhances cross-channel feature fusion and expands the effective receptive field, thereby improving semantic understanding while preserving fine-grained local details. Extensive experiments demonstrate that InceptionMamba achieves state-of-the-art performance on image classification and multiple downstream vision tasks, with significant gains in both parameter efficiency and computational efficiency.

Technology Category

Application Category

📝 Abstract

Within the family of convolutional neural networks, InceptionNeXt has shown excellent competitiveness in image classification and a number of downstream tasks. Built on parallel one-dimensional strip convolutions, however, it suffers from limited ability of capturing spatial dependencies along different dimensions and fails to fully explore spatial modeling in local neighborhood. Besides, inherent locality constraints of convolution operations are detrimental to effective global context modeling. To overcome these limitations, we propose a novel backbone architecture termed InceptionMamba in this study. More specifically, the traditional one-dimensional strip convolutions are replaced by orthogonal band convolutions in our InceptionMamba to achieve cohesive spatial modeling. Furthermore, global contextual modeling can be achieved via a bottleneck Mamba module, facilitating enhanced cross-channel information fusion and enlarged receptive field. Extensive evaluations on classification and various downstream tasks demonstrate that the proposed InceptionMamba achieves state-of-the-art performance with superior parameter and computational efficiency. The source code will be available at https://github.com/Wake1021/InceptionMamba.

Problem

Research questions and friction points this paper is trying to address.

Limited spatial dependency capture in InceptionNeXt

Ineffective global context modeling due to locality

Need cohesive spatial and cross-channel fusion enhancement

Innovation

Methods, ideas, or system contributions that make the work stand out.

Replaces strip convolutions with band convolutions

Introduces bottleneck Mamba for global context

Enhances cross-channel fusion and receptive field

🔎 Similar Papers

MambaVision: A Hybrid Mamba-Transformer Vision Backbone

2024-07-10arXiv.orgCitations: 38

iiANET: Inception Inspired Attention Hybrid Network for efficient Long-Range Dependency

2024-07-10arXiv.orgCitations: 3

MAP: Unleashing Hybrid Mamba-Transformer Vision Backbone's Potential with Masked Autoregressive Pretraining

2024-10-01arXiv.orgCitations: 2

Authors to Follow