๐ค AI Summary
Fine-grained image forgery localization faces challenges including difficulty in detecting subtle manipulations, high computational overhead, and poor cross-domain generalization. Method: This paper proposes a Transformer-based multi-frequencyโmulti-scale fusion framework. It innovatively unifies multi-frequency and multi-scale attention modeling within skip connections and introduces, for the first time, a curvature-based global prior map as an edge-aware difficulty guidance signal to dynamically enhance sensitivity to minute forgeries. By integrating multispectral features with global contextual modeling, the framework significantly strengthens representational capacity. Contribution/Results: Our method achieves state-of-the-art performance across multiple benchmark datasets. Notably, it maintains high localization accuracy on unseen manipulation types and under cross-domain settings, demonstrating superior robustness and computational efficiency.
๐ Abstract
Image editing techniques have rapidly advanced, facilitating both innovative use cases and malicious manipulation of digital images. Deep learning-based methods have recently achieved high accuracy in pixel-level forgery localization, yet they frequently struggle with computational overhead and limited representation power, particularly for subtle or complex tampering. In this paper, we propose M2SFormer, a novel Transformer encoder-based framework designed to overcome these challenges. Unlike approaches that process spatial and frequency cues separately, M2SFormer unifies multi-frequency and multi-scale attentions in the skip connection, harnessing global context to better capture diverse forgery artifacts. Additionally, our framework addresses the loss of fine detail during upsampling by utilizing a global prior map, a curvature metric indicating the difficulty of forgery localization, which then guides a difficulty-guided attention module to preserve subtle manipulations more effectively. Extensive experiments on multiple benchmark datasets demonstrate that M2SFormer outperforms existing state-of-the-art models, offering superior generalization in detecting and localizing forgeries across unseen domains.