Context-Aware Multimodal Claim Verification in Spoken Dialogues

πŸ“… 2026-06-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study addresses the challenge of fact-checking misleading claims constructed through multi-turn interactions in conversational audioβ€”a scenario poorly handled by existing methods. The work presents the first systematic investigation of claim verification in this setting, introducing a calibrated multimodal verification approach that integrates a context-aware audio encoder with a dialogue-aware textual model. To support research in this domain, the authors construct MAD2, a new benchmark comprising 1,000 dialogues and 3,368 verifiable claims. Experimental results demonstrate that dialogue structure exerts a stronger influence on verification performance than the deceptive phrasing of claims themselves. Notably, leveraging only preceding contextual information enables near-offline accuracy in real-time verification, underscoring the critical role of dialogue context in enhancing multimodal fact-checking effectiveness.
πŸ“ Abstract
Every day, millions absorb claims from podcasts and streams that no fact-checker ever sees. Spoken misinformation is built through conversation, where credibility comes not from facts alone but from how claims are framed, reinforced, or left unchallenged across turns. Yet fact-checking has focused on isolated text, leaving dialogue audio under-studied. We introduce MAD2, a new Multi-turn Audio Dialogues benchmark for spoken claim verification, containing 1,000 two-speaker dialogues with 3,368 check-worthy claims and approximately 10 hours of audio, and propose calibrated multimodal fusion of a context-aware audio encoder and a dialogue-aware text model. Across settings, adding dialogue context improves verification, but the gains depend on scenario type. Using only preceding context often matches offline performance, supporting live-moderation settings, and audio contributes most when transcript-based models are destabilized by additional context. Overall, conversational structure matters more for verification than misinformation framing.
Problem

Research questions and friction points this paper is trying to address.

spoken dialogue
claim verification
context-aware
multimodal
misinformation
Innovation

Methods, ideas, or system contributions that make the work stand out.

context-aware multimodal fusion
spoken claim verification
multi-turn dialogue benchmark
audio-text modeling
live moderation
πŸ”Ž Similar Papers