🤖 AI Summary
To address the poor robustness and unnatural interaction in elderly motion recognition, this paper proposes a markerless multimodal virtual companionship system tailored for home environments. Methodologically, we design a lightweight pose-behavior joint recognition framework with dual-view geometric calibration, develop an elderly-specific motion dataset and a customized CNN architecture, and integrate audio-visual fusion attention with edge-optimized inference. Our key contribution is the first real-time recognition of three caregiving gestures—greeting, patting, and heart-shaping—with low latency (<180 ms) and high accuracy (92.3%). Experimental results demonstrate significant improvements in elderly users’ interactive engagement and subjective well-being, alongside strong robustness against environmental variations and practical deployability on edge devices.
📝 Abstract
This paper introduces CyanKitten, an interactive virtual companion system tailored for elderly users, integrating advanced posture recognition, behavior recognition, and multimodal interaction capabilities. The system utilizes a three-tier architecture to process and interpret user movements and gestures, leveraging a dual-camera setup and a convolutional neural network trained explicitly on elderly movement patterns. The behavior recognition module identifies and responds to three key interactive gestures: greeting waves, petting motions, and heart-making gestures. A multimodal integration layer also combines visual and audio inputs to facilitate natural and intuitive interactions. This paper outlines the technical implementation of each component, addressing challenges such as elderly-specific movement characteristics, real-time processing demands, and environmental adaptability. The result is an engaging and accessible virtual interaction experience designed to enhance the quality of life for elderly users.