ProbRes: Probabilistic Jump Diffusion for Open-World Egocentric Activity Recognition

📅 2025-04-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Open-world egocentric first-person activity recognition faces the challenge of dynamic unknown-activity reasoning, primarily due to efficient search in a partially observable, unconstrained semantic space. To address this, we propose a probabilistic residual search framework featuring: (i) a novel stochastic search mechanism grounded in jump-diffusion processes; (ii) a structured, commonsense-prior-guided semantic space; and (iii) a vision-language model (VLM)-based adaptive prediction refinement paradigm. Our method unifies prior-guided exploration with likelihood-driven exploitation. It achieves state-of-the-art performance on benchmarks including GTEA Gaze and EPIC-Kitchens, demonstrates robustness across four levels of open-world openness (L0–L3), and establishes the first systematic methodology taxonomy for open-world egocentric recognition—comprising three hierarchical dimensions: taxonomy modeling, dynamic reasoning, and out-of-distribution generalization.

Technology Category

Application Category

📝 Abstract
Open-world egocentric activity recognition poses a fundamental challenge due to its unconstrained nature, requiring models to infer unseen activities from an expansive, partially observed search space. We introduce ProbRes, a Probabilistic Residual search framework based on jump-diffusion that efficiently navigates this space by balancing prior-guided exploration with likelihood-driven exploitation. Our approach integrates structured commonsense priors to construct a semantically coherent search space, adaptively refines predictions using Vision-Language Models (VLMs) and employs a stochastic search mechanism to locate high-likelihood activity labels while minimizing exhaustive enumeration efficiently. We systematically evaluate ProbRes across multiple openness levels (L0 - L3), demonstrating its adaptability to increasing search space complexity. In addition to achieving state-of-the-art performance on benchmark datasets (GTEA Gaze, GTEA Gaze+, EPIC-Kitchens, and Charades-Ego), we establish a clear taxonomy for open-world recognition, delineating the challenges and methodological advancements necessary for egocentric activity understanding. Our results highlight the importance of structured search strategies, paving the way for scalable and efficient open-world activity recognition.
Problem

Research questions and friction points this paper is trying to address.

Recognizing unseen activities in open-world egocentric videos
Balancing exploration and exploitation in large search spaces
Adapting to increasing complexity in activity recognition tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Probabilistic Residual search framework
Structured commonsense priors integration
Stochastic search mechanism optimization
🔎 Similar Papers
No similar papers found.