Dark&Stormy: Modeling Humor in the Worst Sentences Ever Written

📅 2025-10-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of detecting deliberately crafted “bad humor” in English—a genre where state-of-the-art humor detection models exhibit significant performance degradation. To tackle this, we introduce the first bad-humor corpus derived from the Bulwer-Lytton Fiction Contest, systematically analyzing its structural patterns involving puns, irony, metaphor, and metafictional devices. We conduct the first human–LLM comparative study on bad-humor generation, revealing that LLMs over-rely on specific rhetorical devices and nonce collocations, exposing a rhetorical control bias. Integrating literary rhetorical analysis, controllable prompt engineering, and human–AI collaborative evaluation, we demonstrate that current models lack robust semantic–stylistic disentanglement capabilities for low-quality humor. Our contributions include: (1) a novel, manually annotated bad-humor benchmark; (2) empirical evidence of LLMs’ rhetorical limitations; and (3) open-sourced data and code to advance computational humor research.

Technology Category

Application Category

📝 Abstract
Textual humor is enormously diverse and computational studies need to account for this range, including intentionally bad humor. In this paper, we curate and analyze a novel corpus of sentences from the Bulwer-Lytton Fiction Contest to better understand"bad"humor in English. Standard humor detection models perform poorly on our corpus, and an analysis of literary devices finds that these sentences combine features common in existing humor datasets (e.g., puns, irony) with metaphor, metafiction and simile. LLMs prompted to synthesize contest-style sentences imitate the form but exaggerate the effect by over-using certain literary devices, and including far more novel adjective-noun bigrams than human writers. Data, code and analysis are available at https://github.com/venkatasg/bulwer-lytton
Problem

Research questions and friction points this paper is trying to address.

Modeling intentionally bad humor in English sentences
Analyzing literary devices in unsuccessful humorous writing
Evaluating computational humor detection on poor quality examples
Innovation

Methods, ideas, or system contributions that make the work stand out.

Curated novel corpus from Bulwer-Lytton Fiction Contest
Analyzed literary devices like metaphor and metafiction
Evaluated LLM-generated sentences using adjective-noun bigrams
🔎 Similar Papers
No similar papers found.