A Probabilistic Jump-Diffusion Framework for Open-World Egocentric Activity Recognition

πŸ“… 2025-05-28
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Open-world first-person activity recognition faces fundamental challenges: infinite activity space, partial observability, and poor generalization to unseen activities. To address these, we propose ProbResβ€”the first probabilistic residual search framework grounded in jump-diffusion processes. It constructs a semantically coherent and generalizable search space by jointly integrating structured commonsense priors and adaptive visual-language model (VLM) feedback, thereby avoiding exhaustive enumeration. A novel probabilistic residual optimization mechanism enables robust inference over unseen activities. Evaluated on benchmarks including GTEA Gaze and EPIC-Kitchens, ProbRes achieves state-of-the-art performance. Furthermore, we formally define four levels of openness (L0–L3) for the first time, establishing the inaugural methodological framework for open-world first-person activity recognition.

Technology Category

Application Category

πŸ“ Abstract
Open-world egocentric activity recognition poses a fundamental challenge due to its unconstrained nature, requiring models to infer unseen activities from an expansive, partially observed search space. We introduce ProbRes, a Probabilistic Residual search framework based on jump-diffusion that efficiently navigates this space by balancing prior-guided exploration with likelihood-driven exploitation. Our approach integrates structured commonsense priors to construct a semantically coherent search space, adaptively refines predictions using Vision-Language Models (VLMs) and employs a stochastic search mechanism to locate high-likelihood activity labels while minimizing exhaustive enumeration efficiently. We systematically evaluate ProbRes across multiple openness levels (L0--L3), demonstrating its adaptability to increasing search space complexity. In addition to achieving state-of-the-art performance on benchmark datasets (GTEA Gaze, GTEA Gaze+, EPIC-Kitchens, and Charades-Ego), we establish a clear taxonomy for open-world recognition, delineating the challenges and methodological advancements necessary for egocentric activity understanding.
Problem

Research questions and friction points this paper is trying to address.

Recognizing unseen egocentric activities in unconstrained open-world settings
Balancing exploration and exploitation in large search spaces efficiently
Integrating commonsense priors and VLMs for adaptive activity prediction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Probabilistic Residual search balances exploration and exploitation
Integrates commonsense priors and Vision-Language Models
Stochastic search minimizes exhaustive enumeration efficiently
πŸ”Ž Similar Papers
No similar papers found.