HeadHunt-VAD: Hunting Robust Anomaly-Sensitive Heads in MLLM for Tuning-Free Video Anomaly Detection

📅 2025-12-19

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

Conventional unsupervised video anomaly detection (VAD) methods rely either on large-scale labeled data or computationally intensive modeling; meanwhile, existing prompt-free multimodal large language model (MLLM)-based approaches suffer from textual output constraints—leading to loss of anomaly cues, normalcy bias, and prompt sensitivity. Method: We propose the first head-level probing framework that directly identifies robust, anomaly-sensitive attention heads within a frozen MLLM—bypassing text generation entirely—to enable prompt-free, real-time, and interpretable VAD without fine-tuning. Our approach introduces a multi-criteria (salience + stability) robust-head identification module, coupled with a lightweight anomaly scorer and temporal localizer. Contribution/Results: On UCF-Crime and XD benchmarks, our method achieves state-of-the-art performance among prompt-free methods, with efficient inference. It empirically validates the effectiveness and practicality of mining discriminative attention heads in real-world VAD.

Technology Category

Application Category

📝 Abstract

Video Anomaly Detection (VAD) aims to locate events that deviate from normal patterns in videos. Traditional approaches often rely on extensive labeled data and incur high computational costs. Recent tuning-free methods based on Multimodal Large Language Models (MLLMs) offer a promising alternative by leveraging their rich world knowledge. However, these methods typically rely on textual outputs, which introduces information loss, exhibits normalcy bias, and suffers from prompt sensitivity, making them insufficient for capturing subtle anomalous cues. To address these constraints, we propose HeadHunt-VAD, a novel tuning-free VAD paradigm that bypasses textual generation by directly hunting robust anomaly-sensitive internal attention heads within the frozen MLLM. Central to our method is a Robust Head Identification module that systematically evaluates all attention heads using a multi-criteria analysis of saliency and stability, identifying a sparse subset of heads that are consistently discriminative across diverse prompts. Features from these expert heads are then fed into a lightweight anomaly scorer and a temporal locator, enabling efficient and accurate anomaly detection with interpretable outputs. Extensive experiments show that HeadHunt-VAD achieves state-of-the-art performance among tuning-free methods on two major VAD benchmarks while maintaining high efficiency, validating head-level probing in MLLMs as a powerful and practical solution for real-world anomaly detection.

Problem

Research questions and friction points this paper is trying to address.

Identifies robust attention heads in MLLMs for anomaly detection

Eliminates reliance on textual outputs to reduce information loss

Enables efficient, tuning-free video anomaly detection with interpretability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Directly identifies robust anomaly-sensitive attention heads

Uses multi-criteria analysis for saliency and stability evaluation

Integrates lightweight scorer and locator for efficient detection

🔎 Similar Papers

Hybrid Video Anomaly Detection for Anomalous Scenarios in Autonomous Driving