🤖 AI Summary
This paper addresses the critical limitation in NFT market wash trading detection—its reliance on indirect statistical proxies or private data, lacking direct on-chain identification capability. We propose the first end-to-end AI estimation framework leveraging only publicly available on-chain data. Methodologically, we innovatively embed classical statistical tests—such as trailing-digit integer regression—into a machine learning pipeline, integrating on-chain behavioral modeling, cross-exchange heterogeneous data calibration, and ensemble-based regression feature engineering. Empirical evaluation across major NFT marketplaces reveals that approximately 38% of transaction volume and 60% of trading value exhibit strong evidence of manipulation. Our algorithm substantially reduces both exchange-level and transaction-level false positive rates, demonstrating strong generalizability and robustness not only for NFTs but also across broader cryptocurrency asset classes.
📝 Abstract
Existing studies on crypto wash trading often use indirect statistical methods or leaked private data, both with inherent limitations. This paper leverages public on-chain NFT data for a more direct and granular estimation. Analyzing three major exchanges, we find that ~38% (30-40%) of trades and ~60% (25-95%) of traded value likely involve manipulation, with significant variation across exchanges. This direct evidence enables a critical reassessment of existing indirect methods, identifying roundedness-based regressions `a la Cong et al. (2023) as most promising, though still error-prone in the NFT setting. To address this, we develop an AI-based estimator that integrates these regressions in a machine learning framework, significantly reducing both exchange- and trade-level estimation errors in NFT markets (and beyond).