🤖 AI Summary
This work addresses the limitations of traditional online support group formation methods—namely poor scalability, static categorization, and insufficient personalization—by proposing two novel models, gDMR and gSTM. These models uniquely integrate network node embeddings with topic modeling, incorporating group-specific covariates and sparsity constraints to jointly model users’ textual content, demographic attributes, and social network structure. The resulting framework enables the automatic formation of semantically coherent and personalized support groups. Evaluated on a dataset of over two million posts from MedHelp.org, the proposed approach significantly outperforms baseline methods such as LDA and DMR in terms of prediction accuracy, semantic coherence, and within-group homogeneity. Manual validation further confirms its practical utility in real-world applications.
📝 Abstract
Online health communities (OHCs) are vital for fostering peer support and improving health outcomes. Support groups within these platforms can provide more personalized and cohesive peer support, yet traditional support group formation methods face challenges related to scalability, static categorization, and insufficient personalization. To overcome these limitations, we propose two novel machine learning models for automated support group formation: the Group specific Dirichlet Multinomial Regression (gDMR) and the Group specific Structured Topic Model (gSTM). These models integrate user generated textual content, demographic profiles, and interaction data represented through node embeddings derived from user networks to systematically automate personalized, semantically coherent support group formation.
We evaluate the models on a large scale dataset from MedHelp.org, comprising over 2 million user posts. Both models substantially outperform baseline methods including LDA, DMR, and STM in predictive accuracy (held out log likelihood), semantic coherence (UMass metric), and internal group consistency. The gDMR model yields group covariates that facilitate practical implementation by leveraging relational patterns from network structures and demographic data. In contrast, gSTM emphasizes sparsity constraints to generate more distinct and thematically specific groups. Qualitative analysis further validates the alignment between model generated groups and manually coded themes, showing the practical relevance of the models in informing groups that address diverse health concerns such as chronic illness management, diagnostic uncertainty, and mental health. By reducing reliance on manual curation, these frameworks provide scalable solutions that enhance peer interactions within OHCs, with implications for patient engagement, community resilience, and health outcomes.