Active Light Modulation to Counter Manipulation of Speech Visual Content

📅 2025-04-30

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

To address the vulnerability of high-impact real-time speech videos to lip/facial visual forgeries, this paper proposes Spotlight: a system that embeds physically encrypted signatures into video at capture time via high-speed, invisible-light modulation, enabling real-time, tamper-resistant verification of speaker identity and lip-face motion consistency. Its core contributions are: (1) the first physical-layer optical modulation signature mechanism tailored for video; (2) a 150-bit pose-invariant audio-video joint feature generation framework ensuring semantic interpretability, cryptographic security, and dual imperceptibility to both video processing pipelines and human vision; and (3) an integrated verification algorithm combining locality-sensitive hashing, lightweight cryptographic binding, and robust feature extraction. Extensive evaluations across diverse scenarios achieve AUC ≥ 0.99 and 100% true positive rate, demonstrating strong robustness against post-processing, cross-device acquisition, and white-box attacks.

Technology Category

Application Category

📝 Abstract

High-profile speech videos are prime targets for falsification, owing to their accessibility and influence. This work proposes Spotlight, a low-overhead and unobtrusive system for protecting live speech videos from visual falsification of speaker identity and lip and facial motion. Unlike predominant falsification detection methods operating in the digital domain, Spotlight creates dynamic physical signatures at the event site and embeds them into all video recordings via imperceptible modulated light. These physical signatures encode semantically-meaningful features unique to the speech event, including the speaker's identity and facial motion, and are cryptographically-secured to prevent spoofing. The signatures can be extracted from any video downstream and validated against the portrayed speech content to check its integrity. Key elements of Spotlight include (1) a framework for generating extremely compact (i.e., 150-bit), pose-invariant speech video features, based on locality-sensitive hashing; and (2) an optical modulation scheme that embeds>200 bps into video while remaining imperceptible both in video and live. Prototype experiments on extensive video datasets show Spotlight achieves AUCs $geq$ 0.99 and an overall true positive rate of 100% in detecting falsified videos. Further, Spotlight is highly robust across recording conditions, video post-processing techniques, and white-box adversarial attacks on its video feature extraction methodologies.

Problem

Research questions and friction points this paper is trying to address.

Prevent visual falsification of live speech videos

Embed secure physical signatures via modulated light

Detect manipulated speaker identity and facial motion

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic physical signatures via modulated light

Cryptographically-secured compact speech features

Imperceptible optical embedding exceeding 200 bps

🔎 Similar Papers

No similar papers found.