Robot-Led Vision Language Model Wellbeing Assessment of Children

📅 2025-04-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Automated, fair, and clinically valid assessment of children’s mental health remains challenging. Method: This study introduces the first social-robot (NAO)-guided vision-language model (VLM) collaborative paradigm for automated psychological evaluation: NAO administers the Children’s Apperception Test (CAT), eliciting verbal narratives from children in response to picture stimuli; the VLM then performs end-to-end clinical reasoning and affective-state classification grounded in the CAT’s standardized framework. Contributions/Results: (1) First integration of a validated clinical assessment protocol into VLM inference; (2) Identification of significant gender bias—elevated false-positive rates for girls—highlighting fairness risks in real-world clinical deployment; (3) Empirical evaluation shows moderate inter-rater reliability with human clinicians for low-risk cases (Cohen’s κ = 0.52), but limited accuracy in detecting clinically salient concerns. This work advances interpretable, equitable, and human-centered AI for pediatric mental health assessment while issuing critical cautions regarding bias and clinical validity.

Technology Category

Application Category

📝 Abstract
This study presents a novel robot-led approach to assessing children's mental wellbeing using a Vision Language Model (VLM). Inspired by the Child Apperception Test (CAT), the social robot NAO presented children with pictorial stimuli to elicit their verbal narratives of the images, which were then evaluated by a VLM in accordance with CAT assessment guidelines. The VLM's assessments were systematically compared to those provided by a trained psychologist. The results reveal that while the VLM demonstrates moderate reliability in identifying cases with no wellbeing concerns, its ability to accurately classify assessments with clinical concern remains limited. Moreover, although the model's performance was generally consistent when prompted with varying demographic factors such as age and gender, a significantly higher false positive rate was observed for girls, indicating potential sensitivity to gender attribute. These findings highlight both the promise and the challenges of integrating VLMs into robot-led assessments of children's wellbeing.
Problem

Research questions and friction points this paper is trying to address.

Assessing children's mental wellbeing using robot-led Vision Language Model
Comparing VLM assessments with psychologist evaluations for reliability
Investigating demographic sensitivity in VLM-based wellbeing classification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Robot-led VLM assesses children's mental wellbeing
VLM evaluates verbal narratives from pictorial stimuli
Compares VLM assessments with psychologist evaluations
N
N. I. Abbasi
Department of Computer Science and Technology, University of Cambridge, Cambridge, United Kingdom
Fethiye Irmak Dogan
Fethiye Irmak Dogan
Postdoctoral Research Associate, University of Cambridge
Human-Robot InteractionRobot LearningExplainabilityConversational AIDeep Learning
Guy Laban
Guy Laban
Ben Gurion University of the Negev
Human-Robot interactionHuman-centered AISelf DisclosureAffective ComputingConversational AI
J
Joanna Anderson
Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom
T
Tamsin Ford
Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom
P
Peter B. Jones
Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom
Hatice Gunes
Hatice Gunes
Full Professor of Affective Intelligence & Robotics, University of Cambridge
Artificial IntelligenceAffective AIHealth AIAI FairnessSocially Assistive Robotics