🤖 AI Summary
Traditional frequency-based word clouds suffer from semantic fragmentation, failure to consolidate synonymous expressions, and interference from stop words, limiting their utility for early-stage, interpretable qualitative analysis of interview transcripts. This paper introduces ThemeClouds, an LLM-powered thematic word cloud tool that replaces raw term frequency with “participant mention breadth”—a measure of how widely a concept is referenced across participants. ThemeClouds automatically extracts concept-level themes from ASR transcripts (generated via Whisper), leverages customizable prompts and transparent control mechanisms for semantic consolidation, and applies participant-level weighting for visualization. Its key innovation is the adoption of participant coverage as the primary statistical dimension, enabling researcher-guided thematic modeling and interactive “difference word clouds” for cross-condition comparison. In a user study involving 31 participants and 155 interviews, ThemeClouds outperformed conventional word clouds, LDA, and BERTopic in uncovering authentic user concerns.
📝 Abstract
Word clouds are a common way to summarize qualitative interviews, yet traditional frequency-based methods often fail in conversational contexts: they surface filler words, ignore paraphrase, and fragment semantically related ideas. This limits their usefulness in early-stage analysis, when researchers need fast, interpretable overviews of what participant actually said. We introduce ThemeClouds, an open-source visualization tool that uses large language models (LLMs) to generate thematic, participant-weighted word clouds from dialogue transcripts. The system prompts an LLM to identify concept-level themes across a corpus and then counts how many unique participants mention each topic, yielding a visualization grounded in breadth of mention rather than raw term frequency. Researchers can customize prompts and visualization parameters, providing transparency and control. Using interviews from a user study comparing five recording-device configurations (31 participants; 155 transcripts, Whisper ASR), our approach surfaces more actionable device concerns than frequency clouds and topic-modeling baselines (e.g., LDA, BERTopic). We discuss design trade-offs for integrating LLM assistance into qualitative workflows, implications for interpretability and researcher agency, and opportunities for interactive analyses such as per-condition contrasts (``diff clouds'').