Semi-Supervised Supply Chain Fraud Detection with Unsupervised Pre-Filtering

📅 2025-08-07

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

To address the challenges of label scarcity and severe class imbalance in supply chain fraud detection, this paper proposes a two-stage semi-supervised learning framework. In the first stage, Isolation Forest performs unsupervised coarse anomaly screening; in the second stage, a self-training SVM refines detection by incorporating high-confidence pseudo-labels. The method innovatively integrates unsupervised anomaly detection with semi-supervised classification. Evaluated on the real-world DataCo supply chain dataset, it achieves an F1-score of 0.817 at a false positive rate below 3.0%, significantly outperforming conventional supervised and single-stage semi-supervised baselines. This work establishes a novel, interpretable, high-accuracy, and deployment-friendly paradigm for supply chain risk control under low-supervision and highly imbalanced conditions.

Technology Category

Application Category

📝 Abstract

Detecting fraud in modern supply chains is a growing challenge, driven by the complexity of global networks and the scarcity of labeled data. Traditional detection methods often struggle with class imbalance and limited supervision, reducing their effectiveness in real-world applications. This paper proposes a novel two-phase learning framework to address these challenges. In the first phase, the Isolation Forest algorithm performs unsupervised anomaly detection to identify potential fraud cases and reduce the volume of data requiring further analysis. In the second phase, a self-training Support Vector Machine (SVM) refines the predictions using both labeled and high-confidence pseudo-labeled samples, enabling robust semi-supervised learning. The proposed method is evaluated on the DataCo Smart Supply Chain Dataset, a comprehensive real-world supply chain dataset with fraud indicators. It achieves an F1-score of 0.817 while maintaining a false positive rate below 3.0%. These results demonstrate the effectiveness and efficiency of combining unsupervised pre-filtering with semi-supervised refinement for supply chain fraud detection under real-world constraints, though we acknowledge limitations regarding concept drift and the need for comparison with deep learning approaches.

Problem

Research questions and friction points this paper is trying to address.

Detect fraud in complex global supply chains

Address class imbalance and limited labeled data

Combine unsupervised and semi-supervised learning effectively

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unsupervised pre-filtering with Isolation Forest

Self-training SVM for semi-supervised learning

Combines anomaly detection and pseudo-labeling

🔎 Similar Papers

Enhancing supply chain security with automated machine learning