🤖 AI Summary
Existing camouflaged image generation methods suffer from low visual realism and semantic inconsistency between background and foreground, severely limiting downstream camouflaged object detection (COD) performance. To address these issues, we propose the first layout-controllable, text-image joint-guided generation framework. Our method introduces a fine-grained, text-driven unified out-painting architecture that integrates texture-oriented background retrieval with explicit layout control. We further propose the Background-Foreground Distribution Divergence (BPDD) metric—the first quantitative measure for evaluating camouflage effectiveness. By preserving semantic coherence, our approach significantly enhances both visual fidelity and camouflage imperceptibility. Extensive experiments demonstrate state-of-the-art COD performance across multiple benchmarks. Both qualitative visualizations and quantitative evaluations validate the efficacy and superiority of our framework.
📝 Abstract
Camouflaged image generation (CIG) has recently emerged as an efficient alternative for acquiring high-quality training data for camouflaged object detection (COD). However, existing CIG methods still suffer from a substantial gap to real camouflaged imagery: generated images either lack sufficient camouflage due to weak visual similarity, or exhibit cluttered backgrounds that are semantically inconsistent with foreground targets. To address these limitations, we propose ReamCamo, a unified out-painting based framework for realistic camouflaged image generation. ReamCamo explicitly introduces additional layout controls to regulate global image structure, thereby improving semantic coherence between foreground objects and generated backgrounds. Moreover, we construct a multi-modal textual-visual condition by combining a unified fine-grained textual task description with texture-oriented background retrieval, which jointly guides the generation process to enhance visual fidelity and realism. To quantitatively assess camouflage quality, we further introduce a background-foreground distribution divergence metric that measures the effectiveness of camouflage in generated images. Extensive experiments and visualizations demonstrate the effectiveness of our proposed framework.