🤖 AI Summary
Telegram’s privacy-preserving architecture fosters governance challenges including fake accounts, cloned/impersonating channels, and illicit content dissemination. This study conducts a large-scale empirical analysis of 35,382 public channels and 130 million messages, integrating web crawling with data cleaning, supervised learning (feature engineering + classification modeling), social network analysis, and cross-channel content provenance tracing. We formally define and quantify cloned/impersonating channels for the first time, proposing the first high-accuracy (86% F1-score) automated detection model. Our analysis uncovers large-scale propagation of scams, pirated content, and conspiracy theories (e.g., Sabmyk) within unlabeled channels—demonstrating that impersonation-based distribution reaches over one million users. Furthermore, we empirically establish a structural linkage between Telegram’s privacy-centric design and the dark web’s illicit ecosystem, revealing how platform-level anonymity enables coordinated abuse across underground markets and adversarial information operations.
📝 Abstract
Telegram is one of the most used instant messaging apps worldwide. Some of its success lies in providing high privacy protection and social network features like the channels -- virtual rooms in which only the admins can post and broadcast messages to all its subscribers. However, these same features contributed to the emergence of borderline activities and, as is common with Online Social Networks, the heavy presence of fake accounts. Telegram started to address these issues by introducing the verified and scam marks for the channels. Unfortunately, the problem is far from being solved. In this work, we perform a large-scale analysis of Telegram by collecting 35,382 different channels and over 130,000,000 messages. We study the channels that Telegram marks as verified or scam, highlighting analogies and differences. Then, we move to the unmarked channels. Here, we find some of the infamous activities also present on privacy-preserving services of the Dark Web, such as carding, sharing of illegal adult and copyright protected content. In addition, we identify and analyze two other types of channels: the clones and the fakes. Clones are channels that publish the exact content of another channel to gain subscribers and promote services. Instead, fakes are channels that attempt to impersonate celebrities or well-known services. Fakes are hard to identify even by the most advanced users. To detect the fake channels automatically, we propose a machine learning model that is able to identify them with an accuracy of 86%. Lastly, we study Sabmyk, a conspiracy theory that exploited fakes and clones to spread quickly on the platform reaching over 1,000,000 users.