🤖 AI Summary
This paper addresses core challenges in the emerging field of egocentric vision—visual understanding from a first-person perspective. We propose the first four-dimensional task taxonomy: agent understanding, object understanding, environment understanding, and hybrid understanding. For each category, we systematically unify technical approaches, identify shared bottlenecks—including egocentric viewpoint bias, difficulties in temporal modeling, and high annotation costs—and chart evolutionary trends. Leveraging multimodal wearable sensor data, our survey integrates computer vision, behavioral analysis, and scene understanding methodologies, comprehensively covering major datasets, model architectures, and evaluation metrics. The work provides a rigorous theoretical foundation and a practical technology roadmap for applications in augmented/virtual reality (AR/VR) and embodied intelligence, offering both academic rigor and industrial relevance.
📝 Abstract
With the rapid development of artificial intelligence technologies and wearable devices, egocentric vision understanding has emerged as a new and challenging research direction, gradually attracting widespread attention from both academia and industry. Egocentric vision captures visual and multimodal data through cameras or sensors worn on the human body, offering a unique perspective that simulates human visual experiences. This paper provides a comprehensive survey of the research on egocentric vision understanding, systematically analyzing the components of egocentric scenes and categorizing the tasks into four main areas: subject understanding, object understanding, environment understanding, and hybrid understanding. We explore in detail the sub-tasks within each category. We also summarize the main challenges and trends currently existing in the field. Furthermore, this paper presents an overview of high-quality egocentric vision datasets, offering valuable resources for future research. By summarizing the latest advancements, we anticipate the broad applications of egocentric vision technologies in fields such as augmented reality, virtual reality, and embodied intelligence, and propose future research directions based on the latest developments in the field.