🤖 AI Summary
Deepfake detection suffers from high technical complexity and poor interpretability, hindering comprehension by non-expert users. Method: This paper proposes a three-stage explainable detection framework—forgery localization, natural-language simplification, and guided image reconstruction—designed for users with diverse educational backgrounds. It is the first to integrate vision-language large models with controllable image editing to provide dual-path explanations: technical mechanism and accessible semantics. The system generates, end-to-end, localization heatmaps, concise natural-language rationales, and verifiable reconstructed images, substantially reducing cognitive load. Contribution/Results: User studies demonstrate that the framework improves non-experts’ clarity of understanding deepfakes by 37.2% and boosts their confidence in detection by 41.5%, significantly enhancing transparency, credibility, and public accessibility of deepfake detection outcomes.
📝 Abstract
This demonstration paper presents $mathbf{LayLens}$, a tool aimed to make deepfake understanding easier for users of all educational backgrounds. While prior works often rely on outputs containing technical jargon, LayLens bridges the gap between model reasoning and human understanding through a three-stage pipeline: (1) explainable deepfake detection using a state-of-the-art forgery localization model, (2) natural language simplification of technical explanations using a vision-language model, and (3) visual reconstruction of a plausible original image via guided image editing. The interface presents both technical and layperson-friendly explanations in addition to a side-by-side comparison of the uploaded and reconstructed images. A user study with 15 participants shows that simplified explanations significantly improve clarity and reduce cognitive load, with most users expressing increased confidence in identifying deepfakes. LayLens offers a step toward transparent, trustworthy, and user-centric deepfake forensics.