MultiClear: Multimodal Soft Exoskeleton Glove for Transparent Object Grasping Assistance

📅 2025-04-04

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

Visual perception failures in transparent-object grasping severely limit the assistive capability of soft exoskeleton gloves. Method: This paper proposes the first zero-shot vision segmentation foundation model specifically designed for transparent objects, coupled with an RGB-D vision–audition–tactile trimodal fusion framework and a three-tier hierarchical control architecture (semantic understanding → multimodal fusion → PID-driven actuation). Contribution/Results: It achieves, for the first time, closed-loop collaborative control spanning multimodal perception, decision-making, and execution. Experiments demonstrate a 70.37% grasping accuracy on transparent objects in real-world scenarios—significantly outperforming unimodal baselines—while exhibiting robustness and adaptive grasping capability. The framework establishes a novel paradigm for human–robot collaborative manipulation of transparent and low-texture objects.

Technology Category

Application Category

📝 Abstract

Grasping is a fundamental skill for interacting with the environment. However, this ability can be difficult for some (e.g. due to disability). Wearable robotic solutions can enhance or restore hand function, and recent advances have leveraged computer vision to improve grasping capabilities. However, grasping transparent objects remains challenging due to their poor visual contrast and ambiguous depth cues. Furthermore, while multimodal control strategies incorporating tactile and auditory feedback have been explored to grasp transparent objects, the integration of vision with these modalities remains underdeveloped. This paper introduces MultiClear, a multimodal framework designed to enhance grasping assistance in a wearable soft exoskeleton glove for transparent objects by fusing RGB data, depth data, and auditory signals. The exoskeleton glove integrates a tendon-driven actuator with an RGB-D camera and a built-in microphone. To achieve precise and adaptive control, a hierarchical control architecture is proposed. For the proposed hierarchical control architecture, a high-level control layer provides contextual awareness, a mid-level control layer processes multimodal sensory inputs, and a low-level control executes PID motor control for fine-tuned grasping adjustments. The challenge of transparent object segmentation was managed by introducing a vision foundation model for zero-shot segmentation. The proposed system achieves a Grasping Ability Score of 70.37%, demonstrating its effectiveness in transparent object manipulation.

Problem

Research questions and friction points this paper is trying to address.

Enhancing grasping assistance for transparent objects using multimodal fusion

Integrating vision with tactile and auditory feedback in exoskeleton gloves

Improving transparent object segmentation via zero-shot vision foundation models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal fusion of RGB, depth, and auditory signals

Hierarchical control architecture for adaptive grasping

Zero-shot segmentation for transparent objects

🔎 Similar Papers

No similar papers found.