🤖 AI Summary
Current AI companions lack grounding in developmental psychology, limiting their capacity to provide attachment-based emotional support essential for children aged 3–8. To address deficiencies in developmentally appropriate affective architecture, balanced safety–engagement trade-offs, and attachment-oriented evaluation, this work introduces the first attachment-theory-driven multimodal companion robot framework for children. Our contributions include: (1) CARPO—a risk-calibrated preference optimization objective; (2) AttachSecure-Bench, a novel attachment-competence benchmark (Cohen’s κ = 0.81); and (3) a caregiver–child multimodal dataset comprising 125,000 annotated segments. The system integrates vision–speech–behavior perception, hierarchical memory modeling, and cognitive-uncertainty-weighted risk modeling. Experiments demonstrate a composite attachment-competence score of 57.15% (state-of-the-art), secure-base behavior accuracy of 72.99% (vs. human expert 78.4%), and risk identification accuracy of 69.73%, significantly outperforming GPT-4o and Claude-3.7-Sonnet.
📝 Abstract
Children's emotional development fundamentally relies on secure attachment relationships, yet current AI companions lack the theoretical foundation to provide developmentally appropriate emotional support. We introduce DinoCompanion, the first attachment-theory-grounded multimodal robot for emotionally responsive child-AI interaction. We address three critical challenges in child-AI systems: the absence of developmentally-informed AI architectures, the need to balance engagement with safety, and the lack of standardized evaluation frameworks for attachment-based capabilities. Our contributions include: (i) a multimodal dataset of 128 caregiver-child dyads containing 125,382 annotated clips with paired preference-risk labels, (ii) CARPO (Child-Aware Risk-calibrated Preference Optimization), a novel training objective that maximizes engagement while applying epistemic-uncertainty-weighted risk penalties, and (iii) AttachSecure-Bench, a comprehensive evaluation benchmark covering ten attachment-centric competencies with strong expert consensus (k{appa}=0.81). DinoCompanion achieves state-of-the-art performance (57.15%), outperforming GPT-4o (50.29%) and Claude-3.7-Sonnet (53.43%), with exceptional secure base behaviors (72.99%, approaching human expert levels of 78.4%) and superior attachment risk detection (69.73%). Ablations validate the critical importance of multimodal fusion, uncertainty-aware risk modeling, and hierarchical memory for coherent, emotionally attuned interactions.