🤖 AI Summary
Existing research lacks a globally representative, multidimensionally aligned dataset of the Ethereum P2P network, hindering systematic evaluation of node performance, geographic centralization risks, and P2P-layer threats. To address this, we introduce EtherBee—the first globally distributed, temporally synchronized Ethereum P2P dataset—collected over three months from ten geographically diverse monitoring nodes. EtherBee integrates fine-grained node metrics, PCAP-level traffic metadata, and honeypot interaction logs, enabling unprecedented spatiotemporal alignment across performance, network behavior, and security events. Our analysis reveals a 42% geographic skew in mainstream client distribution, demonstrating that client-level optimizations are exacerbating network centralization and thereby degrading censorship resistance and fault tolerance. The EtherBee dataset is publicly released and has already enabled three peer-reviewed studies on protocol enhancements and security improvements.
📝 Abstract
We introduce EtherBee, a global dataset integrating detailed Ethereum node metrics, network traffic metadata, and honeypot interaction logs collected from ten geographically diverse vantage points over three months. By correlating node data with granular network sessions and security events, EtherBee provides unique insights into benign and malicious activity, node stability, and network-level threats in the Ethereum peer-to-peer network. A case study shows how client-based optimizations can unintentionally concentrate the network geographically, impacting resilience and censorship resistance. We publicly release EtherBee to promote further investigations into performance, reliability, and security in decentralized networks.