🤖 AI Summary
To address the limitations of manual calibration and poor adaptability in upper-limb assistive exoskeletons—particularly regarding user-specific assistance threshold tuning—this paper proposes an adaptive parameter-tuning method based on offline reinforcement learning (Offline RL). We introduce a novel multi-agent framework that decouples the optimization of biceps and triceps activation thresholds, and, for the first time, incorporate Mixed Q-Functionals (MQF) to model continuous action spaces for dynamic threshold adaptation. Evaluated on the MyoPro 2 platform across horizontal and vertical arm movement tasks, our approach significantly improves human–robot collaborative adaptability, demonstrating the feasibility of data-driven parameter tuning. Compared to conventional expert-dependent manual calibration, the method reduces reliance on domain expertise. However, its generalization capability remains constrained by the scale and diversity of the offline dataset; future work will prioritize expanding data coverage to enhance robustness.
📝 Abstract
Assistive exoskeletons have shown great potential in enhancing mobility for individuals with motor impairments, yet their effectiveness relies on precise parameter tuning for personalized assistance. In this study, we investigate the potential of offline reinforcement learning for optimizing effort thresholds in upper-limb assistive exoskeletons, aiming to reduce reliance on manual calibration. Specifically, we frame the problem as a multi-agent system where separate agents optimize biceps and triceps effort thresholds, enabling a more adaptive and data-driven approach to exoskeleton control. Mixed Q-Functionals (MQF) is employed to efficiently handle continuous action spaces while leveraging pre-collected data, thereby mitigating the risks associated with real-time exploration. Experiments were conducted using the MyoPro 2 exoskeleton across two distinct tasks involving horizontal and vertical arm movements. Our results indicate that the proposed approach can dynamically adjust threshold values based on learned patterns, potentially improving user interaction and control, though performance evaluation remains challenging due to dataset limitations.