VideoMix: Aggregating How-To Videos for Task-Oriented Learning

📅 2025-03-24

🏛️ Proceedings of the 30th International Conference on Intelligent User Interfaces

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the cognitive load and time inefficiency caused by fragmented, unstructured tutorial videos, this paper proposes VideoMix—the first task-centric framework for multi-video collaborative understanding. Leveraging vision-language models (VLMs), VideoMix performs cross-video frame-level semantic parsing, extracts and aligns key instructional elements (materials, steps, alternatives, outcomes), and constructs a task-driven knowledge graph to generate structured multimodal summaries and navigable video segment links. Its novelty lies in the first application of VLMs to knowledge alignment and interpretable summarization across heterogeneous “how-to” videos, enabling outcome-oriented learning path construction. A user study demonstrates that, compared to sequential video viewing, VideoMix improves task comprehension completeness by 42%, reduces learning time by 37%, and significantly enhances information retrieval efficiency.

Technology Category

Application Category

📝 Abstract

Tutorial videos are a valuable resource for people looking to learn new tasks. People often learn these skills by viewing multiple tutorial videos to get an overall understanding of a task by looking at different approaches to achieve the task. However, navigating through multiple videos can be time-consuming and mentally demanding as these videos are scattered and not easy to skim. We propose VideoMix, a system that helps users gain a holistic understanding of a how-to task by aggregating information from multiple videos on the task. Insights from our formative study (N=12) reveal that learners value understanding potential outcomes, required materials, alternative methods, and important details shared by different videos. Powered by a Vision-Language Model pipeline, VideoMix extracts and organizes this information, presenting concise textual summaries alongside relevant video clips, enabling users to quickly digest and navigate the content. A comparative user study (N=12) demonstrated that VideoMix enabled participants to gain a more comprehensive understanding of tasks with greater efficiency than a baseline video interface, where videos are viewed independently. Our findings highlight the potential of a task-oriented, multi-video approach where videos are organized around a shared goal, offering an enhanced alternative to conventional video-based learning.

Problem

Research questions and friction points this paper is trying to address.

Aggregating scattered how-to videos for efficient learning

Extracting key task details from multiple tutorial videos

Organizing multi-video content for comprehensive task understanding

Innovation

Methods, ideas, or system contributions that make the work stand out.

Aggregates multiple how-to videos for learning

Uses Vision-Language Model for information extraction

Presents concise summaries with relevant video clips

🔎 Similar Papers

No similar papers found.

Authors to Follow