VideoMix: Aggregating How-To Videos for Task-Oriented Learning

๐Ÿ“… 2025-03-24
๐Ÿ›๏ธ Proceedings of the 30th International Conference on Intelligent User Interfaces
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the cognitive load and time inefficiency caused by fragmented, unstructured tutorial videos, this paper proposes VideoMixโ€”the first task-centric framework for multi-video collaborative understanding. Leveraging vision-language models (VLMs), VideoMix performs cross-video frame-level semantic parsing, extracts and aligns key instructional elements (materials, steps, alternatives, outcomes), and constructs a task-driven knowledge graph to generate structured multimodal summaries and navigable video segment links. Its novelty lies in the first application of VLMs to knowledge alignment and interpretable summarization across heterogeneous โ€œhow-toโ€ videos, enabling outcome-oriented learning path construction. A user study demonstrates that, compared to sequential video viewing, VideoMix improves task comprehension completeness by 42%, reduces learning time by 37%, and significantly enhances information retrieval efficiency.

Technology Category

Application Category

๐Ÿ“ Abstract
Tutorial videos are a valuable resource for people looking to learn new tasks. People often learn these skills by viewing multiple tutorial videos to get an overall understanding of a task by looking at different approaches to achieve the task. However, navigating through multiple videos can be time-consuming and mentally demanding as these videos are scattered and not easy to skim. We propose VideoMix, a system that helps users gain a holistic understanding of a how-to task by aggregating information from multiple videos on the task. Insights from our formative study (N=12) reveal that learners value understanding potential outcomes, required materials, alternative methods, and important details shared by different videos. Powered by a Vision-Language Model pipeline, VideoMix extracts and organizes this information, presenting concise textual summaries alongside relevant video clips, enabling users to quickly digest and navigate the content. A comparative user study (N=12) demonstrated that VideoMix enabled participants to gain a more comprehensive understanding of tasks with greater efficiency than a baseline video interface, where videos are viewed independently. Our findings highlight the potential of a task-oriented, multi-video approach where videos are organized around a shared goal, offering an enhanced alternative to conventional video-based learning.
Problem

Research questions and friction points this paper is trying to address.

Aggregating scattered how-to videos for efficient learning
Extracting key task details from multiple tutorial videos
Organizing multi-video content for comprehensive task understanding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Aggregates multiple how-to videos for learning
Uses Vision-Language Model for information extraction
Presents concise summaries with relevant video clips
๐Ÿ”Ž Similar Papers
No similar papers found.