🤖 AI Summary
This work proposes ComVi, a novel approach to contextual and synchronized video commenting that addresses the common issue of existing comments being irrelevant to the current visual content, thereby spoiling plot points and disrupting viewer immersion. ComVi is the first system to integrate audiovisual content understanding with multi-objective optimization, aligning user comments with relevant video segments through joint audiovisual feature extraction and temporal alignment. The display sequence of comments is dynamically optimized by jointly considering recency, upvote count, and expected reading duration to ensure contextual relevance and minimal disruption. In user studies, 71.9% of participants significantly preferred ComVi over both YouTube’s comment interface and traditional danmaku systems, demonstrating its effectiveness in enhancing viewing immersion through context-aware comment presentation.
📝 Abstract
On general video-sharing platforms like YouTube, comments are displayed independently of video playback. As viewers often read comments while watching a video, they may encounter ones referring to moments unrelated to the current scene, which can reveal spoilers and disrupt immersion. To address this problem, we present ComVi, a novel system that displays comments at contextually relevant moments, enabling viewers to see time-synchronized comments and video content together. We first map all comments to relevant video timestamps by computing audio-visual correlation, then construct the comment sequence through an optimization that considers temporal relevance, popularity (number of likes), and display duration for comfortable reading. In a user study, ComVi provided a significantly more engaging experience than conventional video interfaces (i.e., YouTube and Danmaku), with 71.9% of participants selecting ComVi as their most preferred interface.