M2SFormer: Multi-Spectral and Multi-Scale Attention with Edge-Aware Difficulty Guidance for Image Forgery Localization

📅 2025-06-25

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

Fine-grained image forgery localization faces challenges including difficulty in detecting subtle manipulations, high computational overhead, and poor cross-domain generalization. Method: This paper proposes a Transformer-based multi-frequency–multi-scale fusion framework. It innovatively unifies multi-frequency and multi-scale attention modeling within skip connections and introduces, for the first time, a curvature-based global prior map as an edge-aware difficulty guidance signal to dynamically enhance sensitivity to minute forgeries. By integrating multispectral features with global contextual modeling, the framework significantly strengthens representational capacity. Contribution/Results: Our method achieves state-of-the-art performance across multiple benchmark datasets. Notably, it maintains high localization accuracy on unseen manipulation types and under cross-domain settings, demonstrating superior robustness and computational efficiency.

Technology Category

Application Category

📝 Abstract

Image editing techniques have rapidly advanced, facilitating both innovative use cases and malicious manipulation of digital images. Deep learning-based methods have recently achieved high accuracy in pixel-level forgery localization, yet they frequently struggle with computational overhead and limited representation power, particularly for subtle or complex tampering. In this paper, we propose M2SFormer, a novel Transformer encoder-based framework designed to overcome these challenges. Unlike approaches that process spatial and frequency cues separately, M2SFormer unifies multi-frequency and multi-scale attentions in the skip connection, harnessing global context to better capture diverse forgery artifacts. Additionally, our framework addresses the loss of fine detail during upsampling by utilizing a global prior map, a curvature metric indicating the difficulty of forgery localization, which then guides a difficulty-guided attention module to preserve subtle manipulations more effectively. Extensive experiments on multiple benchmark datasets demonstrate that M2SFormer outperforms existing state-of-the-art models, offering superior generalization in detecting and localizing forgeries across unseen domains.

Problem

Research questions and friction points this paper is trying to address.

Localizing subtle or complex image forgeries accurately

Reducing computational overhead in forgery detection

Preserving fine details during forgery localization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unifies multi-frequency and multi-scale attentions

Uses global prior map for detail preservation

Guides attention with curvature-based difficulty metric

🔎 Similar Papers

FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models