Multi-source Multimodal Progressive Domain Adaption for Audio-Visual Deception Detection

📅 2025-08-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the multi-source-to-target domain shift challenge in cross-domain audio-visual spoofing detection, this paper proposes a Progressive Multi-source Domain Adaptation (PMDA) framework. PMDA jointly designs progressive alignment mechanisms at both feature and decision levels: multi-source adversarial training aligns cross-domain feature distributions, while category-consistency constraints ensure decision-level consistency across domains. Innovatively integrating audio and visual modalities, PMDA bridges distribution gaps between multiple source domains and the target domain in a staged manner. Evaluated on Phase II of the MMDD Challenge, PMDA achieves 60.43% accuracy and 56.99% F1-score—surpassing the champion team by 5.59 percentage points in F1 and outperforming the third-place team by 6.75 percentage points in accuracy. These results demonstrate PMDA’s effectiveness and state-of-the-art performance under complex, multi-source domain shifts.

Technology Category

Application Category

📝 Abstract
This paper presents the winning approach for the 1st MultiModal Deception Detection (MMDD) Challenge at the 1st Workshop on Subtle Visual Computing (SVC). Aiming at the domain shift issue across source and target domains, we propose a Multi-source Multimodal Progressive Domain Adaptation (MMPDA) framework that transfers the audio-visual knowledge from diverse source domains to the target domain. By gradually aligning source and the target domain at both feature and decision levels, our method bridges domain shifts across diverse multimodal datasets. Extensive experiments demonstrate the effectiveness of our approach securing Top-2 place. Our approach reaches 60.43% on accuracy and 56.99% on F1-score on competition stage 2, surpassing the 1st place team by 5.59% on F1-score and the 3rd place teams by 6.75% on accuracy. Our code is available at https://github.com/RH-Lin/MMPDA.
Problem

Research questions and friction points this paper is trying to address.

Addressing domain shift in audio-visual deception detection
Transferring multimodal knowledge across diverse source domains
Aligning feature and decision levels between domains progressively
Innovation

Methods, ideas, or system contributions that make the work stand out.

Progressive domain adaptation for multimodal alignment
Feature and decision level domain shift bridging
Multi-source audio-visual knowledge transfer framework
🔎 Similar Papers
No similar papers found.
Ronghao Lin
Ronghao Lin
University of Science and Technology of China
Waveform DesignSparse Array DesignStatistical Signal ProcessingOptimization Theory.
S
Sijie Mai
South China Normal University, Guangzhou, China
Y
Ying Zeng
Sun Yat-Sen University, Guangzhou, China
Q
Qiaolin He
Sun Yat-Sen University, Guangzhou, China
A
Aolin Xiong
Sun Yat-Sen University, Guangzhou, China
Haifeng Hu
Haifeng Hu
Sun Yat-sen University