Automated Thematic Analyses Using LLMs: Xylazine Wound Management Social Media Chatter Use Case

📅 2025-07-14

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study investigates the applicability of large language models (LLMs) to expert-level qualitative thematic analysis of social media content, focusing on ketamine wound management—a domain-intensive, inductive task requiring deep clinical expertise. To address the limitations of conventional multi-label classification for low-frequency, specialized themes, we propose a novel iterative binary-classification framework coupled with few-shot prompt engineering. Using GPT-4o, we systematically evaluate zero-shot, one-shot, and few-shot prompting strategies; two-shot prompting achieves 90.9% accuracy and an F1 score of 0.71 on the validation set. Thematic distributions generated by the model exhibit strong agreement with expert human coding (Spearman’s ρ > 0.9). Our approach establishes a reproducible, scalable paradigm for leveraging LLMs in large-scale, high-fidelity qualitative analysis—bridging the gap between automated processing and domain-specific interpretive rigor.

Technology Category

Application Category

📝 Abstract

Background Large language models (LLMs) face challenges in inductive thematic analysis, a task requiring deep interpretive and domain-specific expertise. We evaluated the feasibility of using LLMs to replicate expert-driven thematic analysis of social media data. Methods Using two temporally non-intersecting Reddit datasets on xylazine (n=286 and n=686, for model optimization and validation, respectively) with twelve expert-derived themes, we evaluated five LLMs against expert coding. We modeled the task as a series of binary classifications, rather than a single, multi-label classification, employing zero-, single-, and few-shot prompting strategies and measuring performance via accuracy, precision, recall, and F1-score. Results On the validation set, GPT-4o with two-shot prompting performed best (accuracy: 90.9%; F1-score: 0.71). For high-prevalence themes, model-derived thematic distributions closely mirrored expert classifications (e.g., xylazine use: 13.6% vs. 17.8%; MOUD use: 16.5% vs. 17.8%). Conclusions Our findings suggest that few-shot LLM-based approaches can automate thematic analyses, offering a scalable supplement for qualitative research. Keywords: thematic analysis, large language models, natural language processing, qualitative analysis, social media, prompt engineering, public health

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs for automated thematic analysis of social media data

Assessing LLM performance in replicating expert-driven thematic classifications

Developing scalable few-shot prompting strategies for qualitative research

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs for inductive thematic analysis

Binary classification with few-shot prompting

GPT-4o achieves high accuracy

🔎 Similar Papers

No similar papers found.

Authors to Follow