\textsc{CR-Seg}: Attention-Guided and CoT-Enhanced Coarse-to-Refined Reasoning Segmentation

📅 2026-06-02

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

This work addresses the challenges of cross-modal misalignment and incomplete spatial semantics in referring image segmentation under complex linguistic descriptions. To tackle these issues, the authors propose a two-stage coarse-to-fine vision-language joint reasoning framework. The method innovatively integrates attention map–guided coarse localization with a global-to-local chain-of-thought (GLCoT) mechanism and introduces an Evidence-Aware Prompting (EAP) module to extract salient visual cues for input into Segment Anything Model (SAM), enabling progressive refinement of segmentation masks. Extensive experiments on multiple referring segmentation benchmarks demonstrate that the proposed approach significantly improves both segmentation accuracy and reasoning consistency.

📝 Abstract

Reasoning segmentation aims to segment target objects described by complex language through joint visual-textual reasoning. Existing methods typically rely on either learned semantic tokens to bridge Multimodal Large Language Models (MLLMs) and segmentation models, suffering from difficult cross-modal alignment, or explicit spatial prompts such as bounding boxes, which may lose holistic response semantics. To address these limitations, we propose Attention-Guided and CoT-Enhanced Coarse-to-Refined Reasoning Segmentation, termed CR-Seg, a two-stage framework for coarse-to-refined reasoning segmentation. Specifically, we design an Extract Attention Maps and Points (EAP) module to extract attention maps for coarse target localization and select informative points, both of which are fed into SAM for mask refinement. To alleviate reasoning--answer inconsistency, we further introduce Global-to-Local Chain-of-Thought (GLCoT), which guides the model to reason progressively from global scene context to local target details. Extensive experiments on reasoning segmentation benchmarks demonstrate the effectiveness of CR-Seg.

Problem

Research questions and friction points this paper is trying to address.

reasoning segmentation

cross-modal alignment

spatial prompts

holistic semantics

visual-textual reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Coarse-to-Refined Reasoning

Attention-Guided Segmentation

Chain-of-Thought Reasoning