LASFNet: A Lightweight Attention-Guided Self-Modulation Feature Fusion Network for Multimodal Object Detection

📅 2025-06-26

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

To address redundancy, training complexity, and high computational cost in feature fusion modules for multimodal object detection, this paper proposes LASFNet—a lightweight network. Methodologically, it introduces three key innovations: (1) a single-stage feature fusion unit, replacing conventional stacked architectures to significantly simplify training; (2) an Attention-guided Self-Modulated Fusion (ASFF) module that adaptively coordinates global and local feature responses; and (3) a lightweight Feature Attention Transformation Module (FATM) to enhance the discriminability of fused features. Evaluated on three mainstream benchmarks, LASFNet achieves 1–3% higher detection accuracy while reducing model parameters by 90% and computational cost by 85%, establishing a new state-of-the-art trade-off between efficiency and accuracy.

Technology Category

Application Category

📝 Abstract

Effective deep feature extraction via feature-level fusion is crucial for multimodal object detection. However, previous studies often involve complex training processes that integrate modality-specific features by stacking multiple feature-level fusion units, leading to significant computational overhead. To address this issue, we propose a new fusion detection baseline that uses a single feature-level fusion unit to enable high-performance detection, thereby simplifying the training process. Based on this approach, we propose a lightweight attention-guided self-modulation feature fusion network (LASFNet), which introduces a novel attention-guided self-modulation feature fusion (ASFF) module that adaptively adjusts the responses of fusion features at both global and local levels based on attention information from different modalities, thereby promoting comprehensive and enriched feature generation. Additionally, a lightweight feature attention transformation module (FATM) is designed at the neck of LASFNet to enhance the focus on fused features and minimize information loss. Extensive experiments on three representative datasets demonstrate that, compared to state-of-the-art methods, our approach achieves a favorable efficiency-accuracy trade-off, reducing the number of parameters and computational cost by as much as 90% and 85%, respectively, while improving detection accuracy (mAP) by 1%-3%. The code will be open-sourced at https://github.com/leileilei2000/LASFNet.

Problem

Research questions and friction points this paper is trying to address.

Simplifies multimodal object detection training with single fusion unit

Reduces computational overhead while maintaining high detection accuracy

Enhances feature fusion via lightweight attention-guided modulation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Single feature-level fusion unit simplifies training

Attention-guided self-modulation adapts fusion features

Lightweight feature attention minimizes information loss

🔎 Similar Papers

No similar papers found.