Understanding and Predicting Derailment in Toxic Conversations on GitHub

📅 2025-03-04

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This study addresses the early prediction of conversational derailment in toxic interactions within the GitHub open-source community. To overcome limitations of prior work—namely, the absence of fine-grained annotations and domain-specific adaptation—we introduce the first open-source collaboration–focused dataset, comprising 202 toxic and 696 non-toxic dialogues, and pioneer the manual annotation of precise derailment points. Linguistic analysis identifies key precursors, including second-person pronouns, negations, and affective cues such as “bitter frustration” and “impatience.” We propose an LLM-driven dialogue trajectory summarization technique coupled with F1-optimized prompt engineering to enable proactive moderation. Our approach achieves a 69% F1-score on derailment prediction—significantly outperforming conventional baselines—and establishes a novel, interpretable, and deployable moderation paradigm for open-source content governance.

Technology Category

Application Category

📝 Abstract

Software projects thrive on the involvement and contributions of individuals from different backgrounds. However, toxic language and negative interactions can hinder the participation and retention of contributors and alienate newcomers. Proactive moderation strategies aim to prevent toxicity from occurring by addressing conversations that have derailed from their intended purpose. This study aims to understand and predict conversational derailment leading to toxicity on GitHub. To facilitate this research, we curate a novel dataset comprising 202 toxic conversations from GitHub with annotated derailment points, along with 696 non-toxic conversations as a baseline. Based on this dataset, we identify unique characteristics of toxic conversations and derailment points, including linguistic markers such as second-person pronouns, negation terms, and tones of Bitter Frustration and Impatience, as well as patterns in conversational dynamics between project contributors and external participants. Leveraging these empirical observations, we propose a proactive moderation approach to automatically detect and address potentially harmful conversations before escalation. By utilizing modern LLMs, we develop a conversation trajectory summary technique that captures the evolution of discussions and identifies early signs of derailment. Our experiments demonstrate that LLM prompts tailored to provide summaries of GitHub conversations achieve 69% F1-Score in predicting conversational derailment, strongly improving over a set of baseline approaches.

Problem

Research questions and friction points this paper is trying to address.

Understanding and predicting toxic conversational derailment on GitHub.

Identifying linguistic and dynamic patterns in toxic GitHub conversations.

Developing proactive moderation using LLMs to detect early derailment signs.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Curated dataset with toxic and non-toxic GitHub conversations

Identified linguistic markers and conversational dynamics patterns

LLM-based technique for early derailment detection and moderation

🔎 Similar Papers

No similar papers found.