2D_3D Feature Fusion via Cross-Modal Latent Synthesis and Attention Guided Restoration for Industrial Anomaly Detection

📅 2025-10-19
📈 Citations: 0
Influential: 0
📄 PDF

career value

226K/year
🤖 AI Summary
In industrial anomaly detection (IAD), robust cross-modal fusion of 2D images and 3D point clouds remains challenging. This paper proposes an unsupervised multimodal fusion framework that (i) constructs a unified latent space to align RGB and point cloud features, (ii) employs attention-guided modality-specific decoders for precise dual-modal feature reconstruction, and (iii) enables fine-grained anomaly localization via reconstruction error. The method comprises a shared fusion encoder, attention-based decoders, a composite loss function, and a reconstruction-based evaluation mechanism. Evaluated on MVTec 3D-AD and Eyecandies, it achieves mean image-level AUROC scores of 0.972 and 0.901, respectively—significantly outperforming existing unsupervised approaches, especially under few-shot conditions. Its core contributions are cross-modal latent synthesis and attention-driven disentangled reconstruction, enabling the first label-free, collaborative 2D–3D anomaly localization.

Technology Category

Application Category

📝 Abstract
Industrial anomaly detection (IAD) increasingly benefits from integrating 2D and 3D data, but robust cross-modal fusion remains challenging. We propose a novel unsupervised framework, Multi-Modal Attention-Driven Fusion Restoration (MAFR), which synthesises a unified latent space from RGB images and point clouds using a shared fusion encoder, followed by attention-guided, modality-specific decoders. Anomalies are localised by measuring reconstruction errors between input features and their restored counterparts. Evaluations on the MVTec 3D-AD and Eyecandies benchmarks demonstrate that MAFR achieves state-of-the-art results, with a mean I-AUROC of 0.972 and 0.901, respectively. The framework also exhibits strong performance in few-shot learning settings, and ablation studies confirm the critical roles of the fusion architecture and composite loss. MAFR offers a principled approach for fusing visual and geometric information, advancing the robustness and accuracy of industrial anomaly detection. Code is available at https://github.com/adabrh/MAFR
Problem

Research questions and friction points this paper is trying to address.

Fusing 2D and 3D data for robust industrial anomaly detection
Creating unified latent space from RGB images and point clouds
Localizing anomalies via reconstruction error measurement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fuses RGB images and point clouds via shared encoder
Uses attention-guided decoders for modality restoration
Detects anomalies through reconstruction error measurement