M2SFormer: Multi-Spectral and Multi-Scale Attention with Edge-Aware Difficulty Guidance for Image Forgery Localization

๐Ÿ“… 2025-06-25
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

197K/year
๐Ÿค– AI Summary
Fine-grained image forgery localization faces challenges including difficulty in detecting subtle manipulations, high computational overhead, and poor cross-domain generalization. Method: This paper proposes a Transformer-based multi-frequencyโ€“multi-scale fusion framework. It innovatively unifies multi-frequency and multi-scale attention modeling within skip connections and introduces, for the first time, a curvature-based global prior map as an edge-aware difficulty guidance signal to dynamically enhance sensitivity to minute forgeries. By integrating multispectral features with global contextual modeling, the framework significantly strengthens representational capacity. Contribution/Results: Our method achieves state-of-the-art performance across multiple benchmark datasets. Notably, it maintains high localization accuracy on unseen manipulation types and under cross-domain settings, demonstrating superior robustness and computational efficiency.

Technology Category

Application Category

๐Ÿ“ Abstract
Image editing techniques have rapidly advanced, facilitating both innovative use cases and malicious manipulation of digital images. Deep learning-based methods have recently achieved high accuracy in pixel-level forgery localization, yet they frequently struggle with computational overhead and limited representation power, particularly for subtle or complex tampering. In this paper, we propose M2SFormer, a novel Transformer encoder-based framework designed to overcome these challenges. Unlike approaches that process spatial and frequency cues separately, M2SFormer unifies multi-frequency and multi-scale attentions in the skip connection, harnessing global context to better capture diverse forgery artifacts. Additionally, our framework addresses the loss of fine detail during upsampling by utilizing a global prior map, a curvature metric indicating the difficulty of forgery localization, which then guides a difficulty-guided attention module to preserve subtle manipulations more effectively. Extensive experiments on multiple benchmark datasets demonstrate that M2SFormer outperforms existing state-of-the-art models, offering superior generalization in detecting and localizing forgeries across unseen domains.
Problem

Research questions and friction points this paper is trying to address.

Localizing subtle or complex image forgeries accurately
Reducing computational overhead in forgery detection
Preserving fine details during forgery localization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unifies multi-frequency and multi-scale attentions
Uses global prior map for detail preservation
Guides attention with curvature-based difficulty metric