🤖 AI Summary
Prior research lacks longitudinal empirical foundations for understanding user behavior, information diffusion, and community evolution on Discord. To address this gap, we construct the largest publicly available Discord dataset to date—spanning 2015–2024, encompassing 3,167 servers, 2.05 billion messages, and 4.74 million users—covering approximately 10% of Discord’s Discovery directory. The dataset supports multilingual analysis (English, Spanish, French, Portuguese, etc.) and structured computational social science workflows. Data were collected via Discord’s official API, rigorously anonymized, and standardized into JSON format. Our analysis reveals three key findings: (1) dynamic temporal patterns in user activity; (2) substantial growth in bot adoption across communities; and (3) a thematic shift toward non-gaming communities—including social, artistic, musical, and meme-oriented spaces—as dominant categories. This dataset fills a critical gap in empirical research on decentralized governance and online community dynamics.
📝 Abstract
Discord has evolved from a gaming-focused communication tool into a versatile platform supporting diverse online communities. Despite its large user base and active public servers, academic research on Discord remains limited due to data accessibility challenges. This paper introduces Discord Unveiled: A Comprehensive Dataset of Public Communication (2015-2024), the most extensive Discord public server's data to date. The dataset comprises over 2.05 billion messages from 4.74 million users across 3,167 public servers, representing approximately 10% of servers listed in Discord's Discovery feature. Spanning from Discord's launch in 2015 to the end of 2024, it offers a robust temporal and thematic framework for analyzing decentralized moderation, community governance, information dissemination, and social dynamics. Data was collected through Discord's public API, adhering to ethical guidelines and privacy standards via anonymization techniques. Organized into structured JSON files, the dataset facilitates seamless integration with computational social science methodologies. Preliminary analyses reveal significant trends in user engagement, bot utilization, and linguistic diversity, with English predominating alongside substantial representations of Spanish, French, and Portuguese. Additionally, prevalent community themes such as social, art, music, and memes highlight Discord's expansion beyond its gaming origins.