SGANet: Semantic and Geometric Alignment for Multimodal Multi-view Anomaly Detection

📅 2026-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of feature inconsistency in multi-view, multi-modal industrial defect detection caused by viewpoint variations and modality discrepancies. To achieve physically consistent cross-view and cross-modal feature representations, the authors propose a unified framework named SGANet that jointly models semantic and geometric alignment. The method innovatively integrates a Selective Cross-view Feature Refinement Module (SCFRM), Semantic-Structural Patch Alignment (SSPA), and Multi-View Geometry Alignment (MVGA) to cooperatively optimize feature interaction and consistency. Extensive experiments on the SiM3D and Eyecandies datasets demonstrate that SGANet achieves state-of-the-art performance in both anomaly detection and localization, confirming its effectiveness in real-world industrial scenarios.
📝 Abstract
Multi-view anomaly detection aims to identify surface defects on complex objects using observations captured from multiple viewpoints. However, existing unsupervised methods often suffer from feature inconsistency arising from viewpoint variations and modality discrepancies. To address these challenges, we propose a Semantic and Geometric Alignment Network (SGANet), a unified framework for multimodal multi-view anomaly detection that effectively combines semantic and geometric alignment to learn physically coherent feature representations across viewpoints and modalities. SGANet consists of three key components. The Selective Cross-view Feature Refinement Module (SCFRM) selectively aggregates informative patch features from adjacent views to enhance cross-view feature interaction. The Semantic-Structural Patch Alignment (SSPA) enforces semantic alignment across modalities while maintaining structural consistency under viewpoint transformations. The Multi-View Geometric Alignment (MVGA) further aligns geometrically corresponding patches across viewpoints. By jointly modeling feature interaction, semantic and structural consistency, and global geometric correspondence, SGANet effectively enhances anomaly detection performance in multimodal multi-view settings. Extensive experiments on the SiM3D and Eyecandies datasets demonstrate that SGANet achieves state-of-the-art performance in both anomaly detection and localization, validating its effectiveness in realistic industrial scenarios.
Problem

Research questions and friction points this paper is trying to address.

multi-view anomaly detection
feature inconsistency
viewpoint variations
modality discrepancies
surface defect detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic Alignment
Geometric Alignment
Multi-view Anomaly Detection
Multimodal Learning
Feature Consistency
🔎 Similar Papers
No similar papers found.
L
Letian Bai
Smart Manufacturing Thrust, The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, 511453, China
C
Chengyu Tao
College of Mechanical and Vehicle Engineering, Hunan University, Changsha, 410082, China
Juan Du
Juan Du
Professor at NIMTE, CAS
magnetic materialsnanomaterial assemblysemiconductor devices