🤖 AI Summary
This paper addresses key challenges in anti-money laundering (AML): severe label scarcity, high false positive rates, and complex temporal patterns in transaction data. To tackle these, we propose a Transformer-based unsupervised representation learning framework. Methodologically: (i) contrastive learning pretrains temporal representations on unlabeled transaction sequences; (ii) a dual-threshold scoring mechanism differentiates degrees of suspiciousness; and (iii) the Benjamini–Hochberg procedure controls the false discovery rate (FDR) under multiple hypothesis testing. Experiments demonstrate that our approach significantly outperforms rule-based engines and LSTM baselines under extreme low-supervision settings (requiring only a small number of labeled samples). While maintaining a false positive rate below 0.1%, it achieves a 23.6% improvement in money laundering detection recall. Moreover, the framework exhibits strong cross-institutional generalization capability. The method combines theoretical rigor—particularly in statistical FDR control—with practical deployability in real-world AML systems.
📝 Abstract
The present work tackles the money laundering detection problem. A new procedure is introduced which exploits structured time series of both qualitative and quantitative data by means of a transformer neural network. The first step of this procedure aims at learning representations of time series through contrastive learning (without any labels). The second step leverages these representations to generate a money laundering scoring of all observations. A two-thresholds approach is then introduced, which ensures a controlled false-positive rate by means of the Benjamini-Hochberg (BH) procedure. Experiments confirm that the transformer is able to produce general representations that succeed in exploiting money laundering patterns with minimal supervision from domain experts. It also illustrates the higher ability of the new procedure for detecting nonfraudsters as well as fraudsters, while keeping the false positive rate under control. This greatly contrasts with rule-based procedures or the ones based on LSTM architectures.