🤖 AI Summary
This study investigates the applicability of large language models (LLMs) to expert-level qualitative thematic analysis of social media content, focusing on ketamine wound management—a domain-intensive, inductive task requiring deep clinical expertise. To address the limitations of conventional multi-label classification for low-frequency, specialized themes, we propose a novel iterative binary-classification framework coupled with few-shot prompt engineering. Using GPT-4o, we systematically evaluate zero-shot, one-shot, and few-shot prompting strategies; two-shot prompting achieves 90.9% accuracy and an F1 score of 0.71 on the validation set. Thematic distributions generated by the model exhibit strong agreement with expert human coding (Spearman’s ρ > 0.9). Our approach establishes a reproducible, scalable paradigm for leveraging LLMs in large-scale, high-fidelity qualitative analysis—bridging the gap between automated processing and domain-specific interpretive rigor.
📝 Abstract
Background Large language models (LLMs) face challenges in inductive thematic analysis, a task requiring deep interpretive and domain-specific expertise. We evaluated the feasibility of using LLMs to replicate expert-driven thematic analysis of social media data. Methods Using two temporally non-intersecting Reddit datasets on xylazine (n=286 and n=686, for model optimization and validation, respectively) with twelve expert-derived themes, we evaluated five LLMs against expert coding. We modeled the task as a series of binary classifications, rather than a single, multi-label classification, employing zero-, single-, and few-shot prompting strategies and measuring performance via accuracy, precision, recall, and F1-score. Results On the validation set, GPT-4o with two-shot prompting performed best (accuracy: 90.9%; F1-score: 0.71). For high-prevalence themes, model-derived thematic distributions closely mirrored expert classifications (e.g., xylazine use: 13.6% vs. 17.8%; MOUD use: 16.5% vs. 17.8%). Conclusions Our findings suggest that few-shot LLM-based approaches can automate thematic analyses, offering a scalable supplement for qualitative research. Keywords: thematic analysis, large language models, natural language processing, qualitative analysis, social media, prompt engineering, public health