SFFNet: Synergistic Feature Fusion Network With Dual-Domain Edge Enhancement for UAV Image Object Detection

๐Ÿ“… 2026-04-03
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the challenges of object detection in drone imagery caused by complex background clutter and significant scale variation among targets. To this end, the authors propose SFFNet, a novel architecture that decouples multi-scale objects from the background through a frequency- and spatial-domain collaborative edge enhancement mechanism. The method innovatively integrates dual-domain dynamic edge extraction, linear deformable convolution, and a wide-range perception module, while introducing a collaborative feature pyramid network to strengthen both geometric and semantic representations. Additionally, a six-scale detection head is designed to enable precise localization. Evaluated on the VisDrone and UAVDT benchmarks, SFFNet achieves 36.8 AP and 20.6 AP, respectively, demonstrating substantial improvements in detection accuracy and parameter efficiency while maintaining a lightweight design.
๐Ÿ“ Abstract
Object detection in unmanned aerial vehicle (UAV) images remains a highly challenging task, primarily caused by the complexity of background noise and the imbalance of target scales. Traditional methods easily struggle to effectively separate objects from intricate backgrounds and fail to fully leverage the rich multi-scale information contained within images. To address these issues, we have developed a synergistic feature fusion network (SFFNet) with dual-domain edge enhancement specifically tailored for object detection in UAV images. Firstly, the multi-scale dynamic dual-domain coupling (MDDC) module is designed. This component introduces a dual-driven edge extraction architecture that operates in both the frequency and spatial domains, enabling effective decoupling of multi-scale object edges from background noise. Secondly, to further enhance the representation capability of the model's neck in terms of both geometric and semantic information, a synergistic feature pyramid network (SFPN) is proposed. SFPN leverages linear deformable convolutions to adaptively capture irregular object shapes and establishes long-range contextual associations around targets through the designed wide-area perception module (WPM). Moreover, to adapt to the various applications or resource-constrained scenarios, six detectors of different scales (N/S/M/B/L/X) are designed. Experiments on two challenging aerial datasets (VisDrone and UAVDT) demonstrate the outstanding performance of SFFNet-X, achieving 36.8 AP and 20.6 AP, respectively. The lightweight models (N/S) also maintain a balance between detection accuracy and parameter efficiency. The code will be available at https://github.com/CQNU-ZhangLab/SFFNet.
Problem

Research questions and friction points this paper is trying to address.

UAV image object detection
background noise
scale imbalance
multi-scale information
object detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

dual-domain edge enhancement
synergistic feature fusion
multi-scale dynamic dual-domain coupling
linear deformable convolution
wide-area perception module
Wenfeng Zhang
Wenfeng Zhang
Chongqing Normal University
Computer Vision๏ผŒMulti-modal Learning
J
Jun Ni
College of Computer and Information Science, Chongqing Normal University, Chongqing 401333, China
Yue Meng
Yue Meng
Massachusetts Institute of Technology
ControlPerceptionRobotics
X
Xiaodong Pei
CETC Yizhihang (Chongqing) Technology Co., Ltd, Chongqing 400031, China
W
Wei Hu
College of Computer and Information Science, Chongqing Normal University, Chongqing 401333, China
Qibing Qin
Qibing Qin
Weifang University
Cross-modal retrievalDeep hashingComputer vision
Lei Huang
Lei Huang
Ocean University of China