MARE: Multimodal Alignment and Reinforcement for Explainable Deepfake Detection via Vision-Language Models

📅 2026-01-28

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work proposes an interpretable deepfake detection framework that integrates vision-language models to address the challenge of simultaneously achieving high accuracy, interpretability, and alignment with human cognition in the face of rapidly evolving generative models. For the first time in this domain, the approach incorporates reinforcement learning from human feedback (RLHF) and a forgery disentanglement mechanism. It enables joint spatial-textual reasoning through multimodal alignment, leverages RLHF to generate explanations consistent with human preferences, and introduces a dedicated disentanglement module to isolate forgery traces from high-level semantic representations. Experimental results demonstrate that the method achieves state-of-the-art performance across multiple evaluation metrics, with generated explanations significantly outperforming existing approaches in both accuracy and reliability.

Technology Category

Application Category

📝 Abstract

Deepfake detection is a widely researched topic that is crucial for combating the spread of malicious content, with existing methods mainly modeling the problem as classification or spatial localization. The rapid advancements in generative models impose new demands on Deepfake detection. In this paper, we propose multimodal alignment and reinforcement for explainable Deepfake detection via vision-language models, termed MARE, which aims to enhance the accuracy and reliability of Vision-Language Models (VLMs) in Deepfake detection and reasoning. Specifically, MARE designs comprehensive reward functions, incorporating reinforcement learning from human feedback (RLHF), to incentivize the generation of text-spatially aligned reasoning content that adheres to human preferences. Besides, MARE introduces a forgery disentanglement module to capture intrinsic forgery traces from high-level facial semantics, thereby improving its authenticity detection capability. We conduct thorough evaluations on the reasoning content generated by MARE. Both quantitative and qualitative experimental results demonstrate that MARE achieves state-of-the-art performance in terms of accuracy and reliability.

Problem

Research questions and friction points this paper is trying to address.

Deepfake detection

explainability

vision-language models

multimodal alignment

reinforcement learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-Language Models

Reinforcement Learning from Human Feedback

Multimodal Alignment

Forgery Disentanglement

Explainable Deepfake Detection

🔎 Similar Papers

No similar papers found.

Authors to Follow