Seeing Beyond the Scene: Analyzing and Mitigating Background Bias in Action Recognition

📅 2025-12-17

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

Action recognition models often exhibit background bias—over-relying on contextual cues at the expense of motion semantics—thereby compromising generalization and robustness. This work presents the first systematic evaluation of background bias across three model families: standard classification models, contrastive vision-language pre-trained models (e.g., CLIP), and video large language models (VLLMs). To mitigate this bias, we propose a dual-path disentanglement framework: (1) input purification via human instance segmentation to explicitly remove background interference, and (2) prompt tuning that synergistically combines handcrafted priors with automated search to steer attention toward human pose and motion dynamics. We introduce a quantitative bias evaluation framework to rigorously assess mitigation efficacy. Experiments show a 3.78% reduction in background bias for classification models and a 9.85% improvement in human-centric focus for VLLMs on action discrimination tasks, yielding substantial gains in cross-background robustness.

Technology Category

Application Category

📝 Abstract

Human action recognition models often rely on background cues rather than human movement and pose to make predictions, a behavior known as background bias. We present a systematic analysis of background bias across classification models, contrastive text-image pretrained models, and Video Large Language Models (VLLM) and find that all exhibit a strong tendency to default to background reasoning. Next, we propose mitigation strategies for classification models and show that incorporating segmented human input effectively decreases background bias by 3.78%. Finally, we explore manual and automated prompt tuning for VLLMs, demonstrating that prompt design can steer predictions towards human-focused reasoning by 9.85%.

Problem

Research questions and friction points this paper is trying to address.

Analyzes background bias in action recognition models

Proposes mitigation strategies to reduce bias

Explores prompt tuning for human-focused reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Incorporating segmented human input reduces bias

Manual prompt tuning steers models to human reasoning

Automated prompt design improves action recognition focus

🔎 Similar Papers

Enhancing Screen Time Identification in Children with a Multi-View Vision Language Model and Screen Time Tracker