A Public Dataset For the ZKsync Rollup

📅 2024-07-26

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

High acquisition costs and poor data quality for Layer-2 (L2) blockchain data severely hinder data-driven research in emerging ecosystems such as ZKsync. To address this, we construct and open-source the first high-quality, structured dataset comprehensively capturing one year of on-chain activity on ZKsync Era—filling a critical gap in publicly available, high-fidelity L2 chain data. Leveraging an archival node, our pipeline employs batch synchronization, transaction decoding, state snapshot extraction, and schema normalization to produce a standardized, Parquet-formatted dataset optimized for SQL querying. It comprises over 120 million transactions, tens of millions of addresses, and complete smart contract deployment records. We also release a fully reproducible data extraction workflow and analytical templates. This dataset has already enabled cutting-edge research in MEV modeling, gas optimization, and zk-SNARK verification pattern analysis.

Technology Category

Application Category

📝 Abstract

Despite blockchain data being publicly available, practical challenges and high costs often hinder its effective use by researchers, thus limiting data-driven research and exploration in the blockchain space. This is especially true when it comes to Layer-2 (L2) ecosystems, and ZKsync, in particular. To address these issues, we have curated a dataset from 1 year of activity extracted from a ZKsync Era archive node and made it freely available to external parties. We provide details on this dataset and how it was created, showcase a few example analyses that can be performed with it, and discuss some future research directions.

Problem

Research questions and friction points this paper is trying to address.

High costs hinder blockchain data use

Limited research in Layer-2 ecosystems

ZKsync dataset addresses accessibility issues

Innovation

Methods, ideas, or system contributions that make the work stand out.

Public ZKsync dataset creation

One-year activity data extraction

Freely available for research

🔎 Similar Papers

No similar papers found.