Constructing a Dataset to Support Agent-Based Modeling of Online Interactions: Users, Topics, and Interaction Networks

📅 2026-01-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the common limitation in agent-based social simulations—namely, their reliance on handcrafted rules due to a lack of empirical grounding—by constructing the first large-scale, fine-grained interaction dataset from Reddit, spanning multiple topics including climate change, pandemics, and technology. Leveraging user posts, comments, and temporal interaction patterns, the work defines agent types and constructs a directed, weighted interaction network. Through an integrated approach combining natural language processing, temporal behavioral analysis, and network clustering, the research empirically calibrates agent behaviors, network structures, and information diffusion dynamics. The findings reveal significant differences in user interaction patterns across topics and demonstrate that simulated agents exhibit realistic temporal dynamics, semantic coherence, and emotional evolution consistent with real-world observations.

Technology Category

Application Category

📝 Abstract
Agent-based modeling (ABM) provides a powerful framework for exploring how individual behaviors and interactions give rise to collective social dynamics. However, most ABMs rely on handcrafted or parameterized agent rules that are not empirically grounded, thereby limiting their realism and validation against observed data. To address this gap, we constructed a large-scale, empirically grounded dataset from Reddit to support the development and evaluation of agent-based social simulations. The dataset includes 33 technology-focused, 14 climate-focused, and 7 COVID-related aggregated agents, encompassing around one million posts and comments. Using publicly available posts and comments, we define agent categories based on content and interaction patterns, derive inter-agent relationships from temporal commenting behaviors, and build a directed, weighted network that reflects empirically observed user connections. The resulting dataset enables researchers to calibrate and benchmark agent behavior, network structure, and information diffusion processes against real social dynamics. Our quantitative analysis reveals clear topic-dependent differences in how users interact. Climate discussions show dense, highly connected networks with sustained engagement, COVID-related interactions are sparse and mostly one-directional, and technology discussions are organized around a small number of central hubs. Manual qualitative analysis further shows that agent interactions follow realistic patterns of timing, similarity between users, and sentiment change.
Problem

Research questions and friction points this paper is trying to address.

agent-based modeling
empirical grounding
social simulation
interaction networks
online interactions
Innovation

Methods, ideas, or system contributions that make the work stand out.

agent-based modeling
empirically grounded dataset
interaction network
online social dynamics
Reddit data
🔎 Similar Papers
No similar papers found.
A
Abdul Sittar
E3 Department, Jožef Stefan Institute, Ljubljana, Slovenia
M
Miha Česnovar
Faculty of Mathematics and Physics, University of Ljubljana, Ljubljana, Slovenia
A
Alenka Guček
E3 Department, Jožef Stefan Institute, Ljubljana, Slovenia
Marko Grobelnik
Marko Grobelnik
Jozef Stefan Institute, Slovenia
Artificial IntelligenceMachine LearningNatural Language Processing