🤖 AI Summary
This study investigates the trade-off between user efficiency and accuracy in Video-based Known-Item Search (VKIS) induced by keyframe layout design. For homogeneous video collections and text-based query tasks, we empirically evaluate seven grid- and grouping-based keyframe layouts via controlled experiments, integrating multimodal deep learning–generated ranking results. We find that grouped layouts significantly improve browsing efficiency; a four-column order-preserving grid achieves optimal accuracy but compromises early visibility of targets; while ranking accelerates skipping of irrelevant regions, it delays the appearance of relevant items and increases omission risk. This work is the first to reveal the fundamental tension between ranking and grouping—namely, the conflict between positional preservation and global optimization. We propose a hybrid layout paradigm that jointly optimizes top-result stability and structural coherence of remaining content, offering actionable, evidence-based guidelines for VKIS interface design.
📝 Abstract
Multimodal deep-learning models power interactive video retrieval by ranking keyframes in response to textual queries. Despite these advances, users must still browse ranked candidates manually to locate a target. Keyframe arrangement within the search grid highly affects browsing effectiveness and user efficiency, yet remains underexplored. We report a study with 49 participants evaluating seven keyframe layouts for the Visual Known-Item Search task. Beyond efficiency and accuracy, we relate browsing phenomena, such as overlooks, to layout characteristics. Our results show that a video-grouped layout is the most efficient, while a four-column, rank-preserving grid achieves the highest accuracy. Sorted grids reveal potentials and trade-offs, enabling rapid scanning of uninteresting regions but down-ranking relevant targets to less prominent positions, delaying first arrival times and increasing overlooks.
These findings motivate hybrid designs that preserve positions of top-ranked items while sorting or grouping the remainder, and offer guidance for searching in grids beyond video retrieval.