🤖 AI Summary
Existing image similarity metrics inadequately assess the structural integrity of scene composition (SCS) in generative AI outputs: pixel-wise methods are noise-sensitive, perceptual metrics prioritize aesthetics over geometric consistency, and deep learning–based approaches suffer from high training overhead and limited generalizability. To address this, we propose SCSSIM—a training-free, analytical metric for quantifying SCS fidelity. SCSSIM employs hierarchical cubic segmentation to extract spatial statistical features and explicitly models geometric relationships—including position, scale, and orientation—between foreground objects and background. It thus provides the first quantitative characterization of SCS preservation. Experiments demonstrate that SCSSIM is highly robust to non-compositional perturbations, exhibits strong monotonic response to compositional changes, and significantly outperforms state-of-the-art metrics in structural fidelity evaluation. As a result, SCSSIM offers an interpretable, reliable, and computationally efficient tool for structural assessment of generative models.
📝 Abstract
The rapid advancement of generative AI models necessitates novel methods for evaluating image quality that extend beyond human perception. A critical concern for these models is the preservation of an image's underlying Scene Composition Structure (SCS), which defines the geometric relationships among objects and the background, their relative positions, sizes, orientations, etc. Maintaining SCS integrity is paramount for ensuring faithful and structurally accurate GenAI outputs. Traditional image similarity metrics often fall short in assessing SCS. Pixel-level approaches are overly sensitive to minor visual noise, while perception-based metrics prioritize human aesthetic appeal, neither adequately capturing structural fidelity. Furthermore, recent neural-network-based metrics introduce training overheads and potential generalization issues. We introduce the SCS Similarity Index Measure (SCSSIM), a novel, analytical, and training-free metric that quantifies SCS preservation by exploiting statistical measures derived from the Cuboidal hierarchical partitioning of images, robustly capturing non-object-based structural relationships. Our experiments demonstrate SCSSIM's high invariance to non-compositional distortions, accurately reflecting unchanged SCS. Conversely, it shows a strong monotonic decrease for compositional distortions, precisely indicating when SCS has been altered. Compared to existing metrics, SCSSIM exhibits superior properties for structural evaluation, making it an invaluable tool for developing and evaluating generative models, ensuring the integrity of scene composition.