BlockFound: Customized blockchain foundation model for anomaly detection

📅 2024-10-05
🏛️ arXiv.org
📈 Citations: 3
Influential: 1
📄 PDF
🤖 AI Summary
To address challenges in blockchain transaction anomaly detection—including difficulty modeling multimodal data (addresses, opcodes, numerical values, and textual descriptions), poor generalization of large language models, and insufficient coverage of rule-based systems—this paper introduces the first foundation model customized for on-chain scenarios. We propose a novel blockchain-specific multimodal tokenization scheme, integrate RoPE positional encoding with FlashAttention for efficient long-sequence modeling, and incorporate a masked language modeling pretraining objective. Additionally, we design a lightweight anomaly detection head for end-to-end adaptation. Evaluated on Ethereum and Solana datasets, our method achieves significant improvements in detection rate and recall while substantially reducing false positives. Notably, it is the only existing approach achieving high recall on Solana, consistently outperforming all state-of-the-art baselines across metrics.

Technology Category

Application Category

📝 Abstract
We propose BlockFound, a customized foundation model for anomaly blockchain transaction detection. Unlike existing methods that rely on rule-based systems or directly apply off-the-shelf large language models, BlockFound introduces a series of customized designs to model the unique data structure of blockchain transactions. First, a blockchain transaction is multi-modal, containing blockchain-specific tokens, texts, and numbers. We design a modularized tokenizer to handle these multi-modal inputs, balancing the information across different modalities. Second, we design a customized mask language learning mechanism for pretraining with RoPE embedding and FlashAttention for handling longer sequences. After training the foundation model, we further design a novel detection method for anomaly detection. Extensive evaluations on Ethereum and Solana transactions demonstrate BlockFound's exceptional capability in anomaly detection while maintaining a low false positive rate. Remarkably, BlockFound is the only method that successfully detects anomalous transactions on Solana with high accuracy, whereas all other approaches achieved very low or zero detection recall scores. This work not only provides new foundation models for blockchain but also sets a new benchmark for applying LLMs in blockchain data.
Problem

Research questions and friction points this paper is trying to address.

Detecting anomalies in blockchain transactions using customized Transformer
Handling multi-modal blockchain data with specialized tokenizer and pretraining
Improving detection accuracy on Ethereum and Solana blockchain networks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Customized Transformer for blockchain anomaly detection
Modularized tokenizer for multi-modal transaction data
Masked language modeling with RoPE and FlashAttention
🔎 Similar Papers
No similar papers found.
J
Jiahao Yu
Northwestern University
X
Xian Wu
Northwestern University
H
Haozhuang Liu
New York University
Wenbo Guo
Wenbo Guo
UC Santa Barbara
Machine LearningSecurity
X
Xinyu Xing
Northwestern University & Sec3