ALARM: Automated MLLM-Based Anomaly Detection in Complex-EnviRonment Monitoring with Uncertainty Quantification

📅 2025-12-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Visual anomaly detection (VAD) in complex environmental monitoring faces significant challenges due to strong contextual dependencies and ambiguous anomaly definitions, leading to high epistemic and aleatoric uncertainty. Method: We propose the first end-to-end probabilistic inference framework for multimodal large language models (MLLMs) that jointly integrates uncertainty quantification (UQ), chain-of-thought (CoT) reasoning, and a self-reflection mechanism. UQ is embedded throughout the MLLM’s multi-step reasoning pipeline, augmented by model ensembling and quality assurance strategies to ensure interpretable and robust anomaly discrimination. Contribution/Results: Evaluated on two real-world tasks—smart-home surveillance and clinical wound image classification—the framework substantially outperforms existing state-of-the-art methods. It demonstrates strong cross-domain generalization capability and high-precision detection performance, establishing a novel paradigm for trustworthy multimodal anomaly detection.

Technology Category

Application Category

📝 Abstract
The advance of Large Language Models (LLMs) has greatly stimulated research interest in developing multi-modal LLM (MLLM)-based visual anomaly detection (VAD) algorithms that can be deployed in complex environments. The challenge is that in these complex environments, the anomalies are sometimes highly contextual and also ambiguous, and thereby, uncertainty quantification (UQ) is a crucial capacity for an MLLM-based VAD system to succeed. In this paper, we introduce our UQ-supported MLLM-based VAD framework called ALARM. ALARM integrates UQ with quality-assurance techniques like reasoning chain, self-reflection, and MLLM ensemble for robust and accurate performance and is designed based on a rigorous probabilistic inference pipeline and computational process. Extensive empirical evaluations are conducted using the real-world smart-home benchmark data and wound image classification data, which shows ALARM's superior performance and its generic applicability across different domains for reliable decision-making.
Problem

Research questions and friction points this paper is trying to address.

Detects visual anomalies in complex environments using MLLMs
Quantifies uncertainty for ambiguous contextual anomalies
Ensures robust performance across different application domains
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates uncertainty quantification with MLLM-based anomaly detection
Uses reasoning chain, self-reflection, and ensemble for robustness
Designed on probabilistic inference for reliable decision-making
🔎 Similar Papers
No similar papers found.
C
Congjing Zhang
Department of Industrial and Systems Engineering, University of Washington
F
Feng Lin
Department of Industrial and Systems Engineering, University of Washington
Xinyi Zhao
Xinyi Zhao
Columbia university
Data ScienceData Visualization
Pei Guo
Pei Guo
Soochow University
LLMsNatural Language Generation
W
Wei Li
Wyze Labs, Inc.
L
Lin Chen
Wyze Labs, Inc.
C
Chaoyue Zhao
Department of Industrial and Systems Engineering, University of Washington
S
Shuai Huang
Department of Industrial and Systems Engineering, University of Washington