Context and Transcripts Improve Detection of Deepfake Audios of Public Figures

📅 2026-01-19

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the limitations of existing audio deepfake detection methods, which commonly overlook contextual semantics and speech transcriptions, thereby struggling to identify highly realistic forgeries targeting public figures. To overcome this, the work proposes CADD, a novel multimodal fusion architecture that jointly models acoustic features, transcribed text, and contextual semantics for the first time in this domain. Evaluated on both a real-world journalist-collected deepfake dataset (JDD) and a synthetic dataset (SYN), CADD achieves substantial improvements—ranging from 5% to 37.58% in F1 score, 3.77% to 42.79% in AUC, and 6.17% to 47.83% in EER—over prior approaches. Moreover, the model demonstrates remarkable robustness, with an average performance degradation of only 0.71% under various adversarial attacks, significantly enhancing both detection accuracy and resilience.

Technology Category

Application Category

📝 Abstract

Humans use context to assess the veracity of information. However, current audio deepfake detectors only analyze the audio file without considering either context or transcripts. We create and analyze a Journalist-provided Deepfake Dataset (JDD) of 255 public deepfakes which were primarily contributed by over 70 journalists since early 2024. We also generate a synthetic audio dataset (SYN) of dead public figures and propose a novel Context-based Audio Deepfake Detector (CADD) architecture. In addition, we evaluate performance on two large-scale datasets: ITW and P$^2$V. We show that sufficient context and/or the transcript can significantly improve the efficacy of audio deepfake detectors. Performance (measured via F1 score, AUC, and EER) of multiple baseline audio deepfake detectors and traditional classifiers can be improved by 5%-37.58% in F1-score, 3.77%-42.79% in AUC, and 6.17%-47.83% in EER. We additionally show that CADD, via its use of context and/or transcripts, is more robust to 5 adversarial evasion strategies, limiting performance degradation to an average of just -0.71% across all experiments. Code, models, and datasets are available at our project page: https://sites.northwestern.edu/nsail/cadd-context-based-audio-deepfake-detection (access restricted during review).

Problem

Research questions and friction points this paper is trying to address.

audio deepfake detection

context

transcripts

deepfake audio

veracity assessment

Innovation

Methods, ideas, or system contributions that make the work stand out.

context-aware detection

audio deepfake detection

transcript integration

adversarial robustness

CADD

🔎 Similar Papers

No similar papers found.

Authors to Follow