🤖 AI Summary
Story detection in online communities faces two key challenges: narrative content is dispersed across heterogeneous communities and intermixed with non-narrative text. To address this, we introduce StorySeeker—a comprehensive toolkit comprising (1) a fine-grained, manually annotated dataset of 502 posts spanning 33 Reddit communities; (2) a social-media–adapted narrative coding schema grounded in narratology; and (3) a joint document- and span-level modeling framework for story detection. This work establishes the first fine-grained annotation framework for community-heterogeneous settings—explicitly distinguishing *story spans* from *event spans*—thereby uncovering distributional patterns of online narratives and quantifying their cross- and intra-community transferability. Leveraging BERT/SpanBERT embeddings augmented with linguistically motivated narrative features (e.g., past-tense verbs, first-person agency), our model achieves F1 scores of 0.89 (document-level) and 0.76 (span-level). The identified narrative markers enable scalable cross-community narrative analysis and support empirical studies of persuasive mechanisms in digital discourse.
📝 Abstract
Story detection in online communities is a challenging task as stories are scattered across communities and interwoven with non-storytelling spans within a single text. We address this challenge by building and releasing the StorySeeker toolkit, including a richly annotated dataset of 502 Reddit posts and comments, a detailed codebook adapted to the social media context, and models to predict storytelling at the document and span levels. Our dataset is sampled from hundreds of popular English-language Reddit communities ranging across 33 topic categories, and it contains fine-grained expert annotations, including binary story labels, story spans, and event spans. We evaluate a range of detection methods using our data, and we identify the distinctive textual features of online storytelling, focusing on storytelling spans. We illuminate distributional characteristics of storytelling on a large community-centric social media platform, and we also conduct a case study on r/ChangeMyView, where storytelling is used as one of many persuasive strategies, illustrating that our data and models can be used for both inter- and intra-community research. Finally, we discuss implications of our tools and analyses for narratology and the study of online communities.