🤖 AI Summary
Collaborative robots (cobots) lack interpretability and formal safety criteria in tool grasping tasks. Method: This paper proposes an interpretable AI framework integrating concept learning with generative grasp detection. We design an end-to-end vision-grasp joint model that jointly performs tool recognition and 6D optimal grasp pose generation via neural networks, and incorporate explainable AI techniques—particularly Concept Activation Vectors (CAVs)—to explicitly map low-level features to semantic safety concepts (e.g., “grasp stability” and “collision-avoidance margin”), yielding verifiable safety criteria. Results: Evaluated on an industrial tool dataset, the framework achieves a +12.3% improvement in grasp success rate and enhanced consistency in human-robot handover positions. Crucially, it provides human-understandable decision rationales, thereby strengthening safety and trustworthiness of cobots in dynamic human-robot coexistence scenarios.
📝 Abstract
Neural networks are often regarded as universal equations that can estimate any function. This flexibility, however, comes with the drawback of high complexity, rendering these networks into black box models, which is especially relevant in safety-centric applications. To that end, we propose a pipeline for a collaborative robot (Cobot) grasping algorithm that detects relevant tools and generates the optimal grasp. To increase the transparency and reliability of this approach, we integrate an explainable AI method that provides an explanation for the underlying prediction of a model by extracting the learned features and correlating them to corresponding classes from the input. These concepts are then used as additional criteria to ensure the safe handling of work tools. In this paper, we show the consistency of this approach and the criterion for improving the handover position. This approach was tested in an industrial environment, where a camera system was set up to enable a robot to pick up certain tools and objects.