🤖 AI Summary
Conventional image understanding focuses on superficial object and scene recognition, neglecting the contextual, temporal, and causal semantics essential for event comprehension. Method: The EVENTA Challenge introduces a novel paradigm for event-level multimodal understanding, centered on the “who, when, where, what, why” framework to advance narrative reasoning beyond static recognition. It establishes the first large-scale event-level multimodal benchmark—built upon OpenEvents V1—and defines two core tasks: event-enhanced image–text retrieval and event-aware caption generation, integrating cross-modal alignment, temporal modeling, and structured event representation. Contribution/Results: The challenge attracted 45 teams from six continents and employed a two-phase (public/private) evaluation protocol. Results were presented at ACM Multimedia 2025. This work provides an interpretable, reasoning-capable foundation for applications including news analysis, media understanding, and cultural archiving.
📝 Abstract
The Event-Enriched Image Analysis (EVENTA) Grand Challenge, hosted at ACM Multimedia 2025, introduces the first large-scale benchmark for event-level multimodal understanding. Traditional captioning and retrieval tasks largely focus on surface-level recognition of people, objects, and scenes, often overlooking the contextual and semantic dimensions that define real-world events. EVENTA addresses this gap by integrating contextual, temporal, and semantic information to capture the who, when, where, what, and why behind an image. Built upon the OpenEvents V1 dataset, the challenge features two tracks: Event-Enriched Image Retrieval and Captioning, and Event-Based Image Retrieval. A total of 45 teams from six countries participated, with evaluation conducted through Public and Private Test phases to ensure fairness and reproducibility. The top three teams were invited to present their solutions at ACM Multimedia 2025. EVENTA establishes a foundation for context-aware, narrative-driven multimedia AI, with applications in journalism, media analysis, cultural archiving, and accessibility. Further details about the challenge are available at the official homepage: https://ltnghia.github.io/eventa/eventa-2025.