Multimodal Anomaly Detection with a Mixture-of-Experts

📅 2025-06-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of anomaly detection in robotic manipulation, arising jointly from robot-centric deficiencies (e.g., model inaccuracies, hardware constraints) and environmental disturbances (e.g., dynamic scene changes, external perturbations). To tackle this, we propose a multimodal Mixture-of-Experts (MoE) framework featuring a confidence-based dynamic fusion mechanism. It synergistically integrates a vision-language model—employed for semantic monitoring of the environment—with a Gaussian Mixture Regression (GMR) detector—used to model force and motion deviations—while leveraging the MoE architecture to enable adaptive, modality-specific routing. Evaluated in both household and industrial settings, our method achieves significantly higher frame-level anomaly detection accuracy compared to unimodal baselines, with an average reduction of 60% in detection latency. These results demonstrate its efficacy in enabling real-time, robust robotic operation under complex, uncertain conditions.

Technology Category

Application Category

📝 Abstract
With a growing number of robots being deployed across diverse applications, robust multimodal anomaly detection becomes increasingly important. In robotic manipulation, failures typically arise from (1) robot-driven anomalies due to an insufficient task model or hardware limitations, and (2) environment-driven anomalies caused by dynamic environmental changes or external interferences. Conventional anomaly detection methods focus either on the first by low-level statistical modeling of proprioceptive signals or the second by deep learning-based visual environment observation, each with different computational and training data requirements. To effectively capture anomalies from both sources, we propose a mixture-of-experts framework that integrates the complementary detection mechanisms with a visual-language model for environment monitoring and a Gaussian-mixture regression-based detector for tracking deviations in interaction forces and robot motions. We introduce a confidence-based fusion mechanism that dynamically selects the most reliable detector for each situation. We evaluate our approach on both household and industrial tasks using two robotic systems, demonstrating a 60% reduction in detection delay while improving frame-wise anomaly detection performance compared to individual detectors.
Problem

Research questions and friction points this paper is trying to address.

Detects robot-driven anomalies from task models and hardware
Identifies environment-driven anomalies from dynamic changes
Integrates multimodal detection for improved accuracy and speed
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture-of-experts framework for multimodal anomaly detection
Visual-language model monitors environmental anomalies
Gaussian-mixture regression tracks force and motion deviations
🔎 Similar Papers
No similar papers found.