X-Ego: Acquiring Team-Level Tactical Situational Awareness via Cross-Egocentric Contrastive Video Representation Learning

📅 2025-10-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses team-level tactical situation awareness modeling from multi-agent egocentric videos to support collaborative decision-making in complex 3D environments. We propose CECL, a cross-view contrastive learning framework that jointly models fully synchronized egocentric visual observations and state-action trajectories across multiple agents—the first of its kind. To facilitate evaluation, we introduce X-Ego-CS, a benchmark dataset comprising 124 hours of professional match recordings. Our approach integrates contrastive learning, video representation learning, and multi-agent coordination modeling, leveraging state-of-the-art video encoders for cross-view feature alignment and spatial position prediction. On teammate and opponent localization tasks, CECL significantly outperforms existing baselines, demonstrating that global tactical understanding can be achieved from single-egocentric inputs. This establishes a novel paradigm for multi-agent human–AI collaboration and spatial reasoning in immersive 3D settings.

Technology Category

Application Category

📝 Abstract
Human team tactics emerge from each player's individual perspective and their ability to anticipate, interpret, and adapt to teammates' intentions. While advances in video understanding have improved the modeling of team interactions in sports, most existing work relies on third-person broadcast views and overlooks the synchronous, egocentric nature of multi-agent learning. We introduce X-Ego-CS, a benchmark dataset consisting of 124 hours of gameplay footage from 45 professional-level matches of the popular e-sports game Counter-Strike 2, designed to facilitate research on multi-agent decision-making in complex 3D environments. X-Ego-CS provides cross-egocentric video streams that synchronously capture all players' first-person perspectives along with state-action trajectories. Building on this resource, we propose Cross-Ego Contrastive Learning (CECL), which aligns teammates' egocentric visual streams to foster team-level tactical situational awareness from an individual's perspective. We evaluate CECL on a teammate-opponent location prediction task, demonstrating its effectiveness in enhancing an agent's ability to infer both teammate and opponent positions from a single first-person view using state-of-the-art video encoders. Together, X-Ego-CS and CECL establish a foundation for cross-egocentric multi-agent benchmarking in esports. More broadly, our work positions gameplay understanding as a testbed for multi-agent modeling and tactical learning, with implications for spatiotemporal reasoning and human-AI teaming in both virtual and real-world domains. Code and dataset are available at https://github.com/HATS-ICT/x-ego.
Problem

Research questions and friction points this paper is trying to address.

Modeling team tactics from synchronized first-person video perspectives
Predicting teammate and opponent positions from individual viewpoints
Developing cross-egocentric learning for multi-agent tactical awareness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-egocentric video streams synchronizing all players' perspectives
Cross-Ego Contrastive Learning aligning teammates' visual streams
First-person view teammate-opponent location prediction using video encoders