π€ AI Summary
This study investigates the dynamic evolution of toxic discourse surrounding U.S. immigration on social media. Methodologically, we propose a hierarchical topic discovery framework integrating instruction-tuned embeddings with recursive HDBSCAN, modeling user posting trajectories within a five-dimensional semantic space; trajectory variance analysis and permutation-based MANOVA enable statistically rigorous cross-group comparisons. Applied to 4 million posts, the approach identifies 157 fine-grained subtopics and reveals divergent behavioral shifts: users exhibiting rising toxicity increasingly adopt fear- and panic-driven narratives, whereas those showing declining toxicity concentrate on legal and policy-related themesβboth trajectories significantly deviating from control-group baselines. Our contribution is the first integrated framework combining large-scale dynamic topic modeling, quantified user trajectory analysis, and formal statistical inference, yielding a reproducible, interpretable, and scalable quantitative paradigm for analyzing polarization dynamics in sociopolitical discourse.
π Abstract
In the online public sphere, discussions about immigration often become increasingly fractious, marked by toxic language and polarization. Drawing on 4 million X posts over six months, we combine a user- and topic-centric approach to study how shifts in toxicity manifest as topical shifts. Our topic discovery method, which leverages instruction-based embeddings and recursive HDBSCAN, uncovers 157 fine-grained subtopics within the U.S. immigration discourse. We focus on users in four groups: (1) those with increasing toxicity, (2) those with decreasing toxicity, and two reference groups with no significant toxicity trend but matched toxicity levels. Treating each posting history as a trajectory through a five-dimensional topic space, we compare average group trajectories using permutational MANOVA. Our findings show that users with increasing toxicity drift toward alarmist, fear-based frames, whereas those with decreasing toxicity pivot toward legal and policy-focused themes. Both patterns diverge statistically significantly from their reference groups. This pipeline, which combines hierarchical topic discovery with trajectory analysis, offers a replicable method for studying dynamic conversations around social issues at scale.