SCALE: Towards Collaborative Content Analysis in Social Science with Large Language Model Agents and Human Intervention

📅 2025-02-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Social science content analysis has long relied on time-intensive manual coding, expert consensus-building, and iterative dictionary refinement. Method: This paper introduces the first end-to-end framework—modeling the entire coding–consensus–iteration pipeline as a collaborative, reflective, and evolvable multi-LLM agent system that integrates theory-driven design with human–AI co-creation. It features a multi-role agent architecture supporting dynamic codebook generation, structured deliberation protocols, and progressive multimodal human intervention. Results: Evaluated on multiple real-world datasets, the approach achieves near-expert inter-coder reliability (Cohen’s κ > 0.85) and strong theoretical alignment, improves analytical efficiency by 3–5×, and significantly enhances process interpretability and result reproducibility. Its core contribution is a paradigm shift—from purely manual analysis to a closed-loop, theory-guided workflow wherein LLMs execute coding while humans dynamically steer and refine outputs.

Technology Category

Application Category

📝 Abstract
Content analysis breaks down complex and unstructured texts into theory-informed numerical categories. Particularly, in social science, this process usually relies on multiple rounds of manual annotation, domain expert discussion, and rule-based refinement. In this paper, we introduce SCALE, a novel multi-agent framework that effectively $underline{ extbf{S}}$imulates $underline{ extbf{C}}$ontent $underline{ extbf{A}}$nalysis via $underline{ extbf{L}}$arge language model (LLM) ag$underline{ extbf{E}}$nts. SCALE imitates key phases of content analysis, including text coding, collaborative discussion, and dynamic codebook evolution, capturing the reflective depth and adaptive discussions of human researchers. Furthermore, by integrating diverse modes of human intervention, SCALE is augmented with expert input to further enhance its performance. Extensive evaluations on real-world datasets demonstrate that SCALE achieves human-approximated performance across various complex content analysis tasks, offering an innovative potential for future social science research.
Problem

Research questions and friction points this paper is trying to address.

Automates social science content analysis tasks
Integrates human intervention with LLM agents
Enhances performance in complex text coding
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM agents simulate content analysis
Integrates human expert intervention
Dynamic codebook evolution enhancement