AI-Augmented LLMs Achieve Therapist-Level Responses in Motivational Interviewing

📅 2025-05-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) lack systematic behavioral assessment frameworks for Motivational Interviewing (MI) in addiction treatment. Method: We propose the first User-Perceived Quality (UPQ) computational evaluation framework tailored to MI, grounded in a human-AI collaborative annotation paradigm that identifies 17 MI-consistent and MI-inconsistent behavioral indicators. Integrating explainable AI (XAI), deep learning–based behavioral modeling, and customized chain-of-thought prompting, our approach enhances empathic reflection while suppressing inappropriate advice generation. Contribution/Results: We establish the first quantitative MI behavioral assessment system; empirically demonstrate that GPT-4 outperforms human clinicians in advice management and achieves clinically acceptable overall response quality; and show significant UPQ improvement—though limitations persist in complex emotional understanding.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) like GPT-4 show potential for scaling motivational interviewing (MI) in addiction care, but require systematic evaluation of therapeutic capabilities. We present a computational framework assessing user-perceived quality (UPQ) through expected and unexpected MI behaviors. Analyzing human therapist and GPT-4 MI sessions via human-AI collaboration, we developed predictive models integrating deep learning and explainable AI to identify 17 MI-consistent (MICO) and MI-inconsistent (MIIN) behavioral metrics. A customized chain-of-thought prompt improved GPT-4's MI performance, reducing inappropriate advice while enhancing reflections and empathy. Although GPT-4 remained marginally inferior to therapists overall, it demonstrated superior advice management capabilities. The model achieved measurable quality improvements through prompt engineering, yet showed limitations in addressing complex emotional nuances. This framework establishes a pathway for optimizing LLM-based therapeutic tools through targeted behavioral metric analysis and human-AI co-evaluation. Findings highlight both the scalability potential and current constraints of LLMs in clinical communication applications.
Problem

Research questions and friction points this paper is trying to address.

Evaluating therapeutic capabilities of LLMs in addiction care
Improving GPT-4's MI performance via prompt engineering
Assessing scalability and limitations of LLMs in clinical communication
Innovation

Methods, ideas, or system contributions that make the work stand out.

Computational framework assessing user-perceived quality
Predictive models integrating deep learning and explainable AI
Customized chain-of-thought prompt improving GPT-4 performance
Y
Yinghui Huang
Research Institute of Digital Governance and Management Decision Innovation, Wuhan University of Technology, Wuhan, 430070, China, and also with School of Management, Wuhan University of Technology, Wuhan, 430070, China
Y
Yuxuan Jiang
Research Institute of Digital Governance and Management Decision Innovation, Wuhan University of Technology, Wuhan, 430070, China, and also with School of Management, Wuhan University of Technology, Wuhan, 430070, China
H
Hui Liu
Key Laboratory of Adolescent Cyberpsychology and Behavior (Central China Normal University), Ministry of Education, Wuhan 430079, China
Y
Yixin Cai
Research Institute of Digital Governance and Management Decision Innovation, Wuhan University of Technology, Wuhan, 430070, China, and also with School of Management, Wuhan University of Technology, Wuhan, 430070, China
W
Weiqing Li
School of Economics and Management, Hubei University of Technology, Wuhan, 430068, China
Xiangen Hu
Xiangen Hu
Chair Professor of Learning Sciences and Technologies, Hong Kong Polytechnic University
Cognitive PsychologyResearch Design and StatisticsArtificial IntelligenceIntelligent Tutoring