O2Former:Direction-Aware and Multi-Scale Query Enhancement for SAR Ship Instance Segmentation

📅 2025-06-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address challenges in SAR image ship instance segmentation—including large scale variations, high object density, and ambiguous boundaries—this paper proposes an end-to-end Mask2Former-based framework. The method introduces two key innovations: (1) an Optimized Query Generator (OQG) that enables SAR-specific query initialization and geometric alignment; and (2) an Orientation-Aware Embedding Module (OAEM), which integrates polar coordinate encoding with orientation-aware convolution to model azimuth-sensitive multi-scale position–semantic interactions. By synergistically combining multi-scale feature fusion and a Transformer decoder, the approach enhances segmentation accuracy for small and arbitrarily oriented ships. Evaluated on multiple SAR ship datasets, the method achieves a 5.2% mAP improvement over state-of-the-art methods, demonstrating particularly strong performance on small-object and orientation-variant ship segmentation tasks.

Technology Category

Application Category

📝 Abstract
Instance segmentation of ships in synthetic aperture radar (SAR) imagery is critical for applications such as maritime monitoring, environmental analysis, and national security. SAR ship images present challenges including scale variation, object density, and fuzzy target boundary, which are often overlooked in existing methods, leading to suboptimal performance. In this work, we propose O2Former, a tailored instance segmentation framework that extends Mask2Former by fully leveraging the structural characteristics of SAR imagery. We introduce two key components. The first is the Optimized Query Generator(OQG). It enables multi-scale feature interaction by jointly encoding shallow positional cues and high-level semantic information. This improves query quality and convergence efficiency. The second component is the Orientation-Aware Embedding Module(OAEM). It enhances directional sensitivity through direction-aware convolution and polar-coordinate encoding. This effectively addresses the challenge of uneven target orientations in SAR scenes. Together, these modules facilitate precise feature alignment from backbone to decoder and strengthen the model's capacity to capture fine-grained structural details. Extensive experiments demonstrate that O2Former outperforms state of the art instance segmentation baselines, validating its effectiveness and generalization on SAR ship datasets.
Problem

Research questions and friction points this paper is trying to address.

Addresses SAR ship instance segmentation challenges
Improves query quality with multi-scale feature interaction
Enhances directional sensitivity for uneven target orientations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-scale feature interaction via Optimized Query Generator
Direction sensitivity with Orientation-Aware Embedding Module
Precise feature alignment from backbone to decoder
F
F. Gao
College of Electronic information engineering, Beihang University, Beijing 100083, China
Y
Y Li
College of Electronic information engineering, Beihang University, Beijing 100083, China
X
X He
College of Electronic information engineering, Beihang University, Beijing 100083, China
J
J Sun
College of Electronic information engineering, Beihang University, Beijing 100083, China
J Wang
J Wang
Peking University
Machine learningartificial intelligencepattern recognitiondeep learningaudio signal