SGANet: Semantic and Geometric Alignment for Multimodal Multi-view Anomaly Detection

📅 2026-04-07

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This work addresses the challenge of feature inconsistency in multi-view, multi-modal industrial defect detection caused by viewpoint variations and modality discrepancies. To achieve physically consistent cross-view and cross-modal feature representations, the authors propose a unified framework named SGANet that jointly models semantic and geometric alignment. The method innovatively integrates a Selective Cross-view Feature Refinement Module (SCFRM), Semantic-Structural Patch Alignment (SSPA), and Multi-View Geometry Alignment (MVGA) to cooperatively optimize feature interaction and consistency. Extensive experiments on the SiM3D and Eyecandies datasets demonstrate that SGANet achieves state-of-the-art performance in both anomaly detection and localization, confirming its effectiveness in real-world industrial scenarios.

Technology Category

Application Category

📝 Abstract

Multi-view anomaly detection aims to identify surface defects on complex objects using observations captured from multiple viewpoints. However, existing unsupervised methods often suffer from feature inconsistency arising from viewpoint variations and modality discrepancies. To address these challenges, we propose a Semantic and Geometric Alignment Network (SGANet), a unified framework for multimodal multi-view anomaly detection that effectively combines semantic and geometric alignment to learn physically coherent feature representations across viewpoints and modalities. SGANet consists of three key components. The Selective Cross-view Feature Refinement Module (SCFRM) selectively aggregates informative patch features from adjacent views to enhance cross-view feature interaction. The Semantic-Structural Patch Alignment (SSPA) enforces semantic alignment across modalities while maintaining structural consistency under viewpoint transformations. The Multi-View Geometric Alignment (MVGA) further aligns geometrically corresponding patches across viewpoints. By jointly modeling feature interaction, semantic and structural consistency, and global geometric correspondence, SGANet effectively enhances anomaly detection performance in multimodal multi-view settings. Extensive experiments on the SiM3D and Eyecandies datasets demonstrate that SGANet achieves state-of-the-art performance in both anomaly detection and localization, validating its effectiveness in realistic industrial scenarios.

Problem

Research questions and friction points this paper is trying to address.

multi-view anomaly detection

feature inconsistency

viewpoint variations

modality discrepancies

surface defect detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic Alignment

Geometric Alignment

Multi-view Anomaly Detection