🤖 AI Summary
This study addresses the common limitation in agent-based social simulations—namely, their reliance on handcrafted rules due to a lack of empirical grounding—by constructing the first large-scale, fine-grained interaction dataset from Reddit, spanning multiple topics including climate change, pandemics, and technology. Leveraging user posts, comments, and temporal interaction patterns, the work defines agent types and constructs a directed, weighted interaction network. Through an integrated approach combining natural language processing, temporal behavioral analysis, and network clustering, the research empirically calibrates agent behaviors, network structures, and information diffusion dynamics. The findings reveal significant differences in user interaction patterns across topics and demonstrate that simulated agents exhibit realistic temporal dynamics, semantic coherence, and emotional evolution consistent with real-world observations.
📝 Abstract
Agent-based modeling (ABM) provides a powerful framework for exploring how individual behaviors and interactions give rise to collective social dynamics. However, most ABMs rely on handcrafted or parameterized agent rules that are not empirically grounded, thereby limiting their realism and validation against observed data. To address this gap, we constructed a large-scale, empirically grounded dataset from Reddit to support the development and evaluation of agent-based social simulations. The dataset includes 33 technology-focused, 14 climate-focused, and 7 COVID-related aggregated agents, encompassing around one million posts and comments. Using publicly available posts and comments, we define agent categories based on content and interaction patterns, derive inter-agent relationships from temporal commenting behaviors, and build a directed, weighted network that reflects empirically observed user connections. The resulting dataset enables researchers to calibrate and benchmark agent behavior, network structure, and information diffusion processes against real social dynamics. Our quantitative analysis reveals clear topic-dependent differences in how users interact. Climate discussions show dense, highly connected networks with sustained engagement, COVID-related interactions are sparse and mostly one-directional, and technology discussions are organized around a small number of central hubs. Manual qualitative analysis further shows that agent interactions follow realistic patterns of timing, similarity between users, and sentiment change.