Where Do People Tell Stories Online? Story Detection Across Online Communities

📅 2023-11-16

🏛️ Annual Meeting of the Association for Computational Linguistics

📈 Citations: 11

✨ Influential: 1

career value

181K/year

🤖 AI Summary

Story detection in online communities faces two key challenges: narrative content is dispersed across heterogeneous communities and intermixed with non-narrative text. To address this, we introduce StorySeeker—a comprehensive toolkit comprising (1) a fine-grained, manually annotated dataset of 502 posts spanning 33 Reddit communities; (2) a social-media–adapted narrative coding schema grounded in narratology; and (3) a joint document- and span-level modeling framework for story detection. This work establishes the first fine-grained annotation framework for community-heterogeneous settings—explicitly distinguishing *story spans* from *event spans*—thereby uncovering distributional patterns of online narratives and quantifying their cross- and intra-community transferability. Leveraging BERT/SpanBERT embeddings augmented with linguistically motivated narrative features (e.g., past-tense verbs, first-person agency), our model achieves F1 scores of 0.89 (document-level) and 0.76 (span-level). The identified narrative markers enable scalable cross-community narrative analysis and support empirical studies of persuasive mechanisms in digital discourse.

📝 Abstract

Story detection in online communities is a challenging task as stories are scattered across communities and interwoven with non-storytelling spans within a single text. We address this challenge by building and releasing the StorySeeker toolkit, including a richly annotated dataset of 502 Reddit posts and comments, a detailed codebook adapted to the social media context, and models to predict storytelling at the document and span levels. Our dataset is sampled from hundreds of popular English-language Reddit communities ranging across 33 topic categories, and it contains fine-grained expert annotations, including binary story labels, story spans, and event spans. We evaluate a range of detection methods using our data, and we identify the distinctive textual features of online storytelling, focusing on storytelling spans. We illuminate distributional characteristics of storytelling on a large community-centric social media platform, and we also conduct a case study on r/ChangeMyView, where storytelling is used as one of many persuasive strategies, illustrating that our data and models can be used for both inter- and intra-community research. Finally, we discuss implications of our tools and analyses for narratology and the study of online communities.

Problem

Research questions and friction points this paper is trying to address.

Detect scattered stories across diverse online communities

Distinguish storytelling from non-storytelling spans in texts

Analyze storytelling features and distributions on social media

Innovation

Methods, ideas, or system contributions that make the work stand out.

StorySeeker toolkit for story detection

Richly annotated dataset from Reddit

Models predicting storytelling at multiple levels

🔎 Similar Papers

No similar papers found.