2D_3D Feature Fusion via Cross-Modal Latent Synthesis and Attention Guided Restoration for Industrial Anomaly Detection

📅 2025-10-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In industrial anomaly detection (IAD), robust cross-modal fusion of 2D images and 3D point clouds remains challenging. This paper proposes an unsupervised multimodal fusion framework that (i) constructs a unified latent space to align RGB and point cloud features, (ii) employs attention-guided modality-specific decoders for precise dual-modal feature reconstruction, and (iii) enables fine-grained anomaly localization via reconstruction error. The method comprises a shared fusion encoder, attention-based decoders, a composite loss function, and a reconstruction-based evaluation mechanism. Evaluated on MVTec 3D-AD and Eyecandies, it achieves mean image-level AUROC scores of 0.972 and 0.901, respectively—significantly outperforming existing unsupervised approaches, especially under few-shot conditions. Its core contributions are cross-modal latent synthesis and attention-driven disentangled reconstruction, enabling the first label-free, collaborative 2D–3D anomaly localization.

Technology Category

Application Category

📝 Abstract
Industrial anomaly detection (IAD) increasingly benefits from integrating 2D and 3D data, but robust cross-modal fusion remains challenging. We propose a novel unsupervised framework, Multi-Modal Attention-Driven Fusion Restoration (MAFR), which synthesises a unified latent space from RGB images and point clouds using a shared fusion encoder, followed by attention-guided, modality-specific decoders. Anomalies are localised by measuring reconstruction errors between input features and their restored counterparts. Evaluations on the MVTec 3D-AD and Eyecandies benchmarks demonstrate that MAFR achieves state-of-the-art results, with a mean I-AUROC of 0.972 and 0.901, respectively. The framework also exhibits strong performance in few-shot learning settings, and ablation studies confirm the critical roles of the fusion architecture and composite loss. MAFR offers a principled approach for fusing visual and geometric information, advancing the robustness and accuracy of industrial anomaly detection. Code is available at https://github.com/adabrh/MAFR
Problem

Research questions and friction points this paper is trying to address.

Fusing 2D and 3D data for robust industrial anomaly detection
Creating unified latent space from RGB images and point clouds
Localizing anomalies via reconstruction error measurement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fuses RGB images and point clouds via shared encoder
Uses attention-guided decoders for modality restoration
Detects anomalies through reconstruction error measurement
🔎 Similar Papers
No similar papers found.
U
Usman Ali
GIFT University, Pakistan
A
Ali Zia
La Trobe University, Australia
A
Abdul Rehman
GIFT University, Pakistan
Umer Ramzan
Umer Ramzan
Lecturer at GIFT University Gujranwala Pakistan
Computer VisionDeep LearningData Science
Z
Zohaib Hassan
GIFT University, Pakistan
T
Talha Sattar
GIFT University, Pakistan
J
Jing Wang
Department of Primary Industries, Queensland, Australia
W
Wei Xiang
La Trobe University, Australia