🤖 AI Summary
Off-topic or toxic interactions in open-source community discussions undermine contributor engagement and project sustainability, yet manual moderation approaches are inherently unscalable. To address this, we introduce the first dedicated dataset for derailment prediction in GitHub discussion threads and propose a two-stage large language model (LLM)-based prompting framework—leveraging Qwen and Llama architectures. Stage one employs Least-to-Most prompting to elicit structured, stepwise reasoning about tension evolution across multi-turn dialogues. Stage two jointly models “Structured Conversation Dynamics” (SCD) summarization and derailment probability estimation, integrating tension-trigger detection with sentiment shift analysis. Our method achieves an F1 score of 0.901 on the curated dataset and 0.797 on external validation—substantially outperforming conventional NLP baselines. It enables accurate, low-latency, and interpretable proactive community governance.
📝 Abstract
Toxic interactions in Open Source Software (OSS) communities reduce contributor engagement and threaten project sustainability. Preventing such toxicity before it emerges requires a clear understanding of how harmful conversations unfold. However, most proactive moderation strategies are manual, requiring significant time and effort from community maintainers. To support more scalable approaches, we curate a dataset of 159 derailed toxic threads and 207 non-toxic threads from GitHub discussions. Our analysis reveals that toxicity can be forecast by tension triggers, sentiment shifts, and specific conversational patterns.
We present a novel Large Language Model (LLM)-based framework for predicting conversational derailment on GitHub using a two-step prompting pipeline. First, we generate extit{Summaries of Conversation Dynamics} (SCDs) via Least-to-Most (LtM) prompting; then we use these summaries to estimate the extit{likelihood of derailment}. Evaluated on Qwen and Llama models, our LtM strategy achieves F1-scores of 0.901 and 0.852, respectively, at a decision threshold of 0.3, outperforming established NLP baselines on conversation derailment. External validation on a dataset of 308 GitHub issue threads (65 toxic, 243 non-toxic) yields an F1-score up to 0.797. Our findings demonstrate the effectiveness of structured LLM prompting for early detection of conversational derailment in OSS, enabling proactive and explainable moderation.