🤖 AI Summary
Existing social interaction detection systems suffer from poor generalizability due to rigid assumptions—such as controlled environments, exclusive support for face-to-face dyadic conversations, and reliance on fixed-length temporal windows. This work introduces the first real-time, on-wrist social interaction detection framework for smartwatches, capable of detecting both face-to-face and virtual interactions while lifting constraints on environment and participant count. Methodologically, it jointly models interaction boundaries using transfer learning–driven foreground speech detection and paralinguistic cues (e.g., whispering), and deploys a lightweight neural architecture for energy-efficient, on-device inference. Evaluated over 38 days across 11 participants in naturalistic settings, the system achieves 73.18% accuracy; follow-up interviews with six users confirmed 100% recall for their socially relevant interactions, demonstrating both technical efficacy and practical utility.
📝 Abstract
Social interactions are a fundamental part of daily life and play a critical role in well-being. As emerging technologies offer opportunities to unobtrusively monitor behavior, there is growing interest in using them to better understand social experiences. However, automatically detecting interactions, particularly via wearable devices, remains underexplored. Existing systems are often limited to controlled environments, constrained to in-person interactions, and rely on rigid assumptions such as the presence of two speakers within a fixed time window. These limitations reduce their generalizability to capture diverse real-world interactions. To address these challenges, we developed a real-time, on-watch system capable of detecting both in-person and virtual interactions. The system leverages transfer learning to detect foreground speech (FS) and infers interaction boundaries based upon FS and conversational cues like whispering. In a real-world evaluation involving 11 participants over a total of 38 days (Mean = 3.45 days, SD = 2.73), the system achieved an interaction detection accuracy of 73.18%. Follow-up with six participants indicated perfect recall for detecting interactions. These preliminary findings demonstrate the potential of our system to capture interactions in daily life, providing a foundation for applications such as personalized interventions targeting social anxiety.