🤖 AI Summary
This study addresses the challenges of identifying science fiction and fantasy content in Wikipedia, where genre boundaries are often ambiguous and community-based annotations exhibit bias. To tackle this, the work proposes a cross-genre classification model that systematically integrates multiple structural cues from Wikipedia—such as categories, internal links, and Wikidata statements—with semantic features. By evaluating the effectiveness of these structural and semantic indicators, the research not only uncovers key signals that distinguish science fiction from fantasy but also significantly improves the accuracy of cross-genre content identification. The approach offers a novel methodology for large-scale typological analysis of digital texts, advancing the capacity to categorize narrative genres in complex, collaboratively edited knowledge repositories.
📝 Abstract
Identifying which Wikipedia articles are related to science fiction, fantasy, or their hybrids is challenging because genre boundaries are porous and frequently overlap. Wikipedia nonetheless offers machine-readable structure beyond text, including categories, internal links (wikilinks), and statements if corresponding Wikidata items. However, each of these signals reflects community conventions and can be biased or incomplete. This study examines structural and semantic features of Wikipedia articles that can be used to identify content related to science fiction and fantasy (SF/F).