Automated Thematic Analyses Using LLMs: Xylazine Wound Management Social Media Chatter Use Case

📅 2025-07-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the applicability of large language models (LLMs) to expert-level qualitative thematic analysis of social media content, focusing on ketamine wound management—a domain-intensive, inductive task requiring deep clinical expertise. To address the limitations of conventional multi-label classification for low-frequency, specialized themes, we propose a novel iterative binary-classification framework coupled with few-shot prompt engineering. Using GPT-4o, we systematically evaluate zero-shot, one-shot, and few-shot prompting strategies; two-shot prompting achieves 90.9% accuracy and an F1 score of 0.71 on the validation set. Thematic distributions generated by the model exhibit strong agreement with expert human coding (Spearman’s ρ > 0.9). Our approach establishes a reproducible, scalable paradigm for leveraging LLMs in large-scale, high-fidelity qualitative analysis—bridging the gap between automated processing and domain-specific interpretive rigor.

Technology Category

Application Category

📝 Abstract
Background Large language models (LLMs) face challenges in inductive thematic analysis, a task requiring deep interpretive and domain-specific expertise. We evaluated the feasibility of using LLMs to replicate expert-driven thematic analysis of social media data. Methods Using two temporally non-intersecting Reddit datasets on xylazine (n=286 and n=686, for model optimization and validation, respectively) with twelve expert-derived themes, we evaluated five LLMs against expert coding. We modeled the task as a series of binary classifications, rather than a single, multi-label classification, employing zero-, single-, and few-shot prompting strategies and measuring performance via accuracy, precision, recall, and F1-score. Results On the validation set, GPT-4o with two-shot prompting performed best (accuracy: 90.9%; F1-score: 0.71). For high-prevalence themes, model-derived thematic distributions closely mirrored expert classifications (e.g., xylazine use: 13.6% vs. 17.8%; MOUD use: 16.5% vs. 17.8%). Conclusions Our findings suggest that few-shot LLM-based approaches can automate thematic analyses, offering a scalable supplement for qualitative research. Keywords: thematic analysis, large language models, natural language processing, qualitative analysis, social media, prompt engineering, public health
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs for automated thematic analysis of social media data
Assessing LLM performance in replicating expert-driven thematic classifications
Developing scalable few-shot prompting strategies for qualitative research
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs for inductive thematic analysis
Binary classification with few-shot prompting
GPT-4o achieves high accuracy
🔎 Similar Papers
No similar papers found.
J
JaMor Hairston
Emory University, Atlanta, GA
R
Ritvik Ranjan
Wheeler High School, Marietta, GA
S
Sahithi Lakamana
Emory University, Atlanta, GA
A
Anthony Spadaro
Rutgers New Jersey Medical School, Newark, NJ
S
Selen Bozkurt
Emory University, Atlanta, GA
J
Jeanmarie Perrone
Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA
Abeed Sarker
Abeed Sarker
Emory University School of Medicine
Natural Language ProcessingBiomedical InformaticsHealth Data ScienceApplied Machine Learning