An AI-Powered Research Assistant in the Lab: A Practical Guide for Text Analysis Through Iterative Collaboration with LLMs

📅 2025-05-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Taxonomy construction for classifying unstructured text (e.g., personal goal statements) is time-intensive, prone to researcher bias, and suffers from poor reproducibility. Method: This paper proposes a human–AI collaborative, iterative text analysis paradigm that integrates top-down and bottom-up strategies to enable dynamic taxonomy generation, evaluation, refinement, and validation. Leveraging prompt engineering, it facilitates multi-turn collaboration between domain researchers and large language models (LLMs), with human feedback driving iterative taxonomy optimization. Intercoder reliability is quantified using Cohen’s κ within a structured coding framework. Results: Empirical evaluation in a life-domain dataset achieves κ > 0.85, significantly enhancing analytical efficiency, reliability, and reproducibility. This work pioneers deep integration of LLMs into the qualitative analysis closed loop, offering a novel methodology for low-bias, high-fidelity open-text classification.

Technology Category

Application Category

📝 Abstract
Analyzing texts such as open-ended responses, headlines, or social media posts is a time- and labor-intensive process highly susceptible to bias. LLMs are promising tools for text analysis, using either a predefined (top-down) or a data-driven (bottom-up) taxonomy, without sacrificing quality. Here we present a step-by-step tutorial to efficiently develop, test, and apply taxonomies for analyzing unstructured data through an iterative and collaborative process between researchers and LLMs. Using personal goals provided by participants as an example, we demonstrate how to write prompts to review datasets and generate a taxonomy of life domains, evaluate and refine the taxonomy through prompt and direct modifications, test the taxonomy and assess intercoder agreements, and apply the taxonomy to categorize an entire dataset with high intercoder reliability. We discuss the possibilities and limitations of using LLMs for text analysis.
Problem

Research questions and friction points this paper is trying to address.

Automating time-consuming text analysis to reduce bias
Developing efficient taxonomies for unstructured data using LLMs
Improving intercoder reliability in text categorization tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Iterative collaboration between researchers and LLMs
Prompt-based taxonomy generation and refinement
High intercoder reliability in dataset categorization
🔎 Similar Papers
No similar papers found.
G
Gino Carmona-D'iaz
Psychology Department, Universidad de los Andes, Bogotá, Colombia; Laboratorio de Emociones y Juicios Morales, Universidad de los Andes, Bogotá, Colombia
M
Mar'ia Alejandra Grisales
Social and Human Sciences Faculty, Universidad Externado de Colombia, Bogotá, Colombia
Chandra Sripada
Chandra Sripada
Professor, Psychiatry and Philosophy, University of Michigan
PsychiatryPhilosophySelf-ControlADHDNeuroimaging
S
Santiago Amaya
Department of Philosophy, Rice University, Houston, TX, USA
M
Michael Inzlicht
Department of Psychology, University of Toronto; Rotman School of Management, University of Toronto
J
Juan Pablo Berm'udez
Social and Human Sciences Faculty, Universidad Externado de Colombia, Bogotá, Colombia; Department of Philosophy, University of Southampton, Southampton, UK