BEYONDWORDS is All You Need: Agentic Generative AI based Social Media Themes Extractor

📅 2025-02-26

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

Balancing interpretability and accuracy remains challenging in topic modeling of unstructured social media text. Method: This paper proposes an end-to-end topic extraction framework tailored to autism-community tweets, introducing a novel three-stage paradigm: “embedding compression → clustering → agent-based chain-of-thought (CoT) generation.” It integrates Tweet-BERT embeddings, UMAP dimensionality reduction, non-negative matrix factorization (NMF), and a dual-LLM协同 mechanism—comprising topic generation and quality verification—enhanced by agent-oriented CoT prompting for semantic refinement and interpretable representation. Results: Evaluated on real-world data, the framework achieves a 32% improvement in topic coherence and attains a Cohen’s Kappa of 0.81 in human evaluation. It operates with minimal supervision and exhibits strong cross-domain transferability, establishing a generalizable, robust paradigm for domain-specific community discourse analysis.

Technology Category

Application Category

📝 Abstract

Thematic analysis of social media posts provides a major understanding of public discourse, yet traditional methods often struggle to capture the complexity and nuance of unstructured, large-scale text data. This study introduces a novel methodology for thematic analysis that integrates tweet embeddings from pre-trained language models, dimensionality reduction using and matrix factorization, and generative AI to identify and refine latent themes. Our approach clusters compressed tweet representations and employs generative AI to extract and articulate themes through an agentic Chain of Thought (CoT) prompting, with a secondary LLM for quality assurance. This methodology is applied to tweets from the autistic community, a group that increasingly uses social media to discuss their experiences and challenges. By automating the thematic extraction process, the aim is to uncover key insights while maintaining the richness of the original discourse. This autism case study demonstrates the utility of the proposed approach in improving thematic analysis of social media data, offering a scalable and adaptable framework that can be applied to diverse contexts. The results highlight the potential of combining machine learning and Generative AI to enhance the depth and accuracy of theme identification in online communities.

Problem

Research questions and friction points this paper is trying to address.

Automates thematic analysis of social media posts

Uses generative AI to refine and articulate themes

Improves accuracy and depth of theme identification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates tweet embeddings and matrix factorization

Uses generative AI for theme extraction

Employs Chain of Thought prompting for refinement

🔎 Similar Papers

A Large Language Model Guided Topic Refinement Mechanism for Short Text Modeling