🤖 AI Summary
Existing interactive electrical muscle stimulation (EMS) systems employ static feedback and lack contextual awareness, limiting adaptability to dynamic tasks and individual user conditions. This paper introduces the first context-aware generative EMS system that jointly interprets user voice commands and egocentric visual input to synthesize personalized, biomechanically compliant stimulation sequences—respecting joint limits, kinematic chains, and physiological EMS thresholds—in real time. Methodologically, we pioneer the tight integration of multimodal large-model reasoning with explicit biomechanical modeling, incorporating modules for object/handedness detection, situational language understanding, rigid-body dynamics simulation, and EMS waveform synthesis. Experiments demonstrate zero-shot generalization to unseen tasks—including pill-bottle opening and spray-can shaking—without task-specific programming, enabling universal physical assistance. Our system significantly enhances the adaptability, safety, and practical utility of EMS interfaces.
📝 Abstract
Decades of interactive electrical-muscle-stimulation (EMS) revealed its promise as a wearable interface for physical assistance-EMS directly demonstrates movements through the users' body (e.g., shaking a spray-can before painting). However, interactive EMS-systems are highly-specialized because their feedback is (1) fixed (e.g., one program executes spray-can instructions, another executes piano instructions) and (2) non-contextual (e.g., using a spray-can while cooking likely involves cooking oil, not paint, and thus shaking is unnecessary). To address this, we explored a more flexible approach and engineered a system that generates muscle-stimulation-instructions given the user's context. Through our examples, we show that such a system is flexible: it enables unprecedented EMS-interactions (e.g., opening a child-proof pill bottle cap) but also replicates existing systems (e.g., shake a spray can)-all without requiring task-specific programming. To achieve this, our system takes in user's spoken-requests and images from their point of view. It uses computer vision (e.g., detect objects/handedness) and large-language-models (e.g., reason about objects/situations) to generate textual-instructions. Finally, these instructions are then constrained by biomechanical-knowledge (e.g., joint limits, kinematic-chain, EMS capabilities) to produce suitable muscle-stimulation gestures. We believe our concept marks a shift toward more general-purpose EMS-interfaces, enabling more flexible and context-aware assistance.