NE-PADD: Leveraging Named Entity Knowledge for Robust Partial Audio Deepfake Detection via Attention Aggregation

📅 2025-09-03

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the insufficient robustness of frame-level forgery localization in partial audio deepfake detection (PADD), this paper proposes NE-PADD—the first PADD framework to incorporate Named Entity (NE) knowledge. Methodologically, it introduces a dual-branch architecture integrating Speech Named Entity Recognition (SpeechNER) and PADD, enhanced by Attention Fusion (AF) and Attention Transfer (AT) mechanisms. Semantic-guided auxiliary losses enable NE-aware feature aggregation and cross-task knowledge transfer. Experiments on our newly constructed PartialSpoof-NER dataset demonstrate that NE-PADD significantly outperforms existing baselines, achieving substantial improvements in both frame-level localization accuracy and generalization across unseen spoofing types. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

Different from traditional sentence-level audio deepfake detection (ADD), partial audio deepfake detection (PADD) requires frame-level positioning of the location of fake speech. While some progress has been made in this area, leveraging semantic information from audio, especially named entities, remains an underexplored aspect. To this end, we propose NE-PADD, a novel method for Partial Audio Deepfake Detection (PADD) that leverages named entity knowledge through two parallel branches: Speech Name Entity Recognition (SpeechNER) and PADD. The approach incorporates two attention aggregation mechanisms: Attention Fusion (AF) for combining attention weights and Attention Transfer (AT) for guiding PADD with named entity semantics using an auxiliary loss. Built on the PartialSpoof-NER dataset, experiments show our method outperforms existing baselines, proving the effectiveness of integrating named entity knowledge in PADD. The code is available at https://github.com/AI-S2-Lab/NE-PADD.

Problem

Research questions and friction points this paper is trying to address.

Detecting frame-level fake speech segments in audio

Leveraging named entity semantics for robust detection

Integrating attention mechanisms with named entity recognition

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging named entity knowledge for detection

Using attention fusion and transfer mechanisms

Integrating SpeechNER and PADD parallel branches

🔎 Similar Papers

No similar papers found.

Authors to Follow