🤖 AI Summary
This study addresses the challenge of efficiently discovering cybercriminal communities on the Telegram platform by proposing a modular content discovery framework that integrates reference-driven snowball sampling, message-level natural language classification, contextual filtering, and market-segment labels. Through systematic comparisons of diverse seed sources, pointer types, and exploration strategies, the work provides the first empirical characterization of accessibility disparities across distinct criminal markets. The evaluation is conducted on a labeled dataset comprising 6,022 communities and 172 million messages, with strategy performance quantified along three dimensions: efficiency, accessibility, and rediscovery rate. This research contributes a reusable discovery pipeline and establishes a methodological foundation for studying darknet ecosystems.
📝 Abstract
This paper presents TeleHunt, a framework and tool for evaluating the effectiveness of different strategies to discover cybercriminal communities on Telegram. TeleHunt employs a set of reference-driven snowballing strategies, integrating message-level classification, contextual filtering, and market-segment labeling. Using open- and dark-web seeds, we systematically evaluate how seed source, pointer type, and exploration strategy influence discovery outcomes in three dimensions: efficiency, accessibility, and rediscovery. Our work provides (i) a modular cybercrime content discovery pipeline, (ii) the first systematic comparison of Telegram discovery strategies with an empirical characterization of market-segment accessibility, and (iii) a labeled dataset of over 172 million messages from 6,022 Telegram communities.