π€ AI Summary
Existing smart contract auditing datasets suffer from limited scalability, coarse granularity, and insufficient diversity due to their reliance on manual curation. To address these limitations, this work proposes GiANT, an automated framework that uniquely integrates a divide-and-conquer strategy with Chain-of-Thought reasoning to efficiently extract multi-granularity vulnerability information from real-world audit reports published by Code4rena. The framework further incorporates an LLM-as-a-judge mechanism to validate data quality. The resulting GiAnt corpus comprises 7,711 high-quality vulnerability instances, achieving an average human evaluation score of 4.76 out of 5 (Cohenβs ΞΊ = 0.88). Furthermore, this study establishes the first benchmark performance across four key smart contract analysis tasks using this new dataset.
π Abstract
High-quality smart contract auditing datasets are crucial for evaluating security tools and advancing smart contract security research. Two major limitations of existing datasets are the manual-induced scalability bottleneck and the deficiency in data granularity and diversity. To address these limitations, we propose GiANT, an automated framework designed to curate smart contract auditing datasets by distilling vulnerability insights from real-world auditing reports. GiANT employs a divide-and-conquer strategy coupled with the Chain-of-Thought technique to extract structured vulnerability information from Code4rena reports, followed by an LLM-as-a-judge mechanism to perform rigorous quality assurance. To evaluate GiANT's effectiveness, we run it on 388 real-world audit reports and generate the GiAnt Corpus comprising 7,711 vulnerability findings across five severity levels. Manual assessment of the dataset demonstrates exceptional reliability in information extraction, achieving a mean quality score of $4.76\pm0.37$ (out of 5) with inter-rater agreement $ΞΊ$ of 0.88. We further validate the practicality of our dataset by benchmarking 4 state-of-the-art LLMs on vulnerability detection, code summarization, mitigation recommendation, and automated gas optimization tasks, to establish performance baselines, thereby providing a valuable data foundation for future research in automated smart contract auditing.