MGDFIS: Multi-scale Global-detail Feature Integration Strategy for Small Object Detection

📅 2025-06-15

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Small-object detection in UAV imagery is hindered by extremely small object sizes, low signal-to-noise ratios, and cluttered backgrounds; existing multi-scale approaches often compromise fine-grained detail or incur excessive computational overhead. To address this, we propose a lightweight multi-scale global–local feature fusion framework featuring a novel Fusion Lock mechanism. This mechanism jointly integrates Token-Statistics self-attention (for long-range semantic modeling), directional convolution with parallel attention (to enhance local structural perception), and dynamic pixel-wise weighting (to suppress background interference), enabling efficient and precise global–local feature coupling. Evaluated on the VisDrone benchmark, our method consistently outperforms state-of-the-art approaches across diverse backbone networks and detector architectures, achieving significant gains in both precision and recall while maintaining real-time inference speed—making it well-suited for resource-constrained onboard UAV platforms.

Technology Category

Application Category

📝 Abstract

Small object detection in UAV imagery is crucial for applications such as search-and-rescue, traffic monitoring, and environmental surveillance, but it is hampered by tiny object size, low signal-to-noise ratios, and limited feature extraction. Existing multi-scale fusion methods help, but add computational burden and blur fine details, making small object detection in cluttered scenes difficult. To overcome these challenges, we propose the Multi-scale Global-detail Feature Integration Strategy (MGDFIS), a unified fusion framework that tightly couples global context with local detail to boost detection performance while maintaining efficiency. MGDFIS comprises three synergistic modules: the FusionLock-TSS Attention Module, which marries token-statistics self-attention with DynamicTanh normalization to highlight spectral and spatial cues at minimal cost; the Global-detail Integration Module, which fuses multi-scale context via directional convolution and parallel attention while preserving subtle shape and texture variations; and the Dynamic Pixel Attention Module, which generates pixel-wise weighting maps to rebalance uneven foreground and background distributions and sharpen responses to true object regions. Extensive experiments on the VisDrone benchmark demonstrate that MGDFIS consistently outperforms state-of-the-art methods across diverse backbone architectures and detection frameworks, achieving superior precision and recall with low inference time. By striking an optimal balance between accuracy and resource usage, MGDFIS provides a practical solution for small-object detection on resource-constrained UAV platforms.

Problem

Research questions and friction points this paper is trying to address.

Improves small object detection in UAV imagery

Reduces computational burden while preserving fine details

Balances accuracy and efficiency for resource-constrained platforms

Innovation

Methods, ideas, or system contributions that make the work stand out.

FusionLock-TSS Attention Module enhances spectral and spatial cues

Global-detail Integration Module preserves shape and texture variations

Dynamic Pixel Attention Module rebalances foreground and background

🔎 Similar Papers

SimPLR: A Simple and Plain Transformer for Efficient Object Detection and Segmentation

2023-10-09Citations: 0

Authors to Follow