TeleScope: A Longitudinal Dataset for Investigating Online Discourse and Information Interaction on Telegram

📅 2025-04-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of large-scale, multidimensional longitudinal datasets on Telegram—a critical bottleneck for cross-domain research in information diffusion and extremist content analysis. We introduce the largest publicly available Telegram dataset to date, comprising metadata for 500,000 channels and message-level metadata for 120 million posts across 71,000 public channels. Crucially, we systematically integrate forward-network topology with multidimensional enhancements: language identification, diurnal activity modeling, and joint regular-expression–NER entity extraction. The dataset enables fine-grained discourse modeling and fully reproducible research. Empirically, it improves channel influence prediction accuracy by +18.3% over prior baselines. Moreover, it supports diverse downstream tasks—including information diffusion modeling, extremist content tracking, and multilingual community evolution analysis—thereby advancing scalable, evidence-based studies of Telegram’s socio-technical ecosystem.

Technology Category

Application Category

📝 Abstract
Telegram is a globally popular instant messaging platform known for its strong emphasis on security, privacy, and unique social networking features. It has recently emerged as the host for various cross-domain analysis and research works, such as social media influence, propaganda studies, and extremism. This paper introduces TeleScope, an extensive dataset suite that, to our knowledge, is the largest of its kind. It comprises metadata for about 500K Telegram channels and downloaded message metadata for about 71K public channels, accounting for around 120M crawled messages. We also release channel connections and user interaction data built using Telegram's message-forwarding feature to study multiple use cases, such as information spread and message forwarding patterns. In addition, we provide data enrichments, such as language detection, active message posting periods for each channel, and Telegram entities extracted from messages, that enable online discourse analysis beyond what is possible with the original data alone. The dataset is designed for diverse applications, independent of specific research objectives, and sufficiently versatile to facilitate the replication of social media studies comparable to those conducted on platforms like X (formerly Twitter)
Problem

Research questions and friction points this paper is trying to address.

Largest dataset for analyzing Telegram's online discourse
Investigates information spread and message forwarding patterns
Enables cross-platform social media research replication
Innovation

Methods, ideas, or system contributions that make the work stand out.

Largest Telegram dataset with 500K channels metadata
Includes message-forwarding data for interaction analysis
Enriched with language detection and activity periods
🔎 Similar Papers
No similar papers found.
S
Susmita Gangopadhyay
GESIS – Leibniz Institute for the Social Sciences, Cologne, Germany
D
Danilo Dessi
Department of Computer Science, College of Computing and Informatics, University of Sharjah, Sharjah, UAE
D
Dimitar Dimitrov
GESIS – Leibniz Institute for the Social Sciences, Cologne, Germany
Stefan Dietze
Stefan Dietze
Full Professor (Heinrich-Heine-University Düsseldorf) & Scientific Director (KTS, GESIS)
Knowledge GraphsInformation RetrievalWeb ScienceNLP