🤖 AI Summary
To address weak error-correction capability and excessive reliance on human intervention when robots grasp visually ambiguous objects in uncertain environments, this paper proposes an autonomous reflection-and-correction framework based on Large Vision-Language Models (LVLMs). The method introduces a novel reflective reasoning mechanism enabling multi-round strategy iteration and dynamic adaptation after failure, and designs a structured, cumulative, and reusable experience memory module to support closed-loop autonomous learning. Technically, it integrates LVLM-based visual understanding, grasp pose estimation, and multi-step reasoning for decision-making. Experiments on eight highly similar object categories demonstrate a 32.7% improvement in grasp success rate over AnyGrasp and GPT-4V, with a 91.4% error recovery rate. The framework significantly enhances robotic robustness and adaptability in open, ambiguous real-world scenarios.
📝 Abstract
As robotic technology rapidly develops, robots are being employed in an increasing number of fields. However, due to the complexity of deployment environments or the prevalence of ambiguous-condition objects, the practical application of robotics still faces many challenges, leading to frequent errors. Traditional methods and some LLM-based approaches, although improved, still require substantial human intervention and struggle with autonomous error correction in complex scenarios.In this work, we propose RoboReflect, a novel framework leveraging large vision-language models (LVLMs) to enable self-reflection and autonomous error correction in robotic grasping tasks. RoboReflect allows robots to automatically adjust their strategies based on unsuccessful attempts until successful execution is achieved.The corrected strategies are saved in a memory for future task reference.We evaluate RoboReflect through extensive testing on eight common objects prone to ambiguous conditions of three categories.Our results demonstrate that RoboReflect not only outperforms existing grasp pose estimation methods like AnyGrasp and high-level action planning techniques using GPT-4V but also significantly enhances the robot's ability to adapt and correct errors independently. These findings underscore the critical importance of autonomous selfreflection in robotic systems while effectively addressing the challenges posed by ambiguous environments.