🤖 AI Summary
To address the dual challenges of privacy preservation and communication overhead in grasp pose detection (GPD) within cluttered environments, this paper proposes a modular federated learning framework. Unlike conventional full-model synchronous updates, our approach introduces module-level learning dynamics analysis for the first time and designs a two-stage training protocol: only modules exhibiting slow convergence undergo frequent communication and aggregation, while others are fine-tuned locally. By incorporating modular model partitioning, dynamic communication scheduling, and partial model updates, the framework significantly reduces bandwidth consumption. On the GraspNet-1B dataset, it achieves higher accuracy than baselines—including FedAvg—under identical communication budgets. Real-world robotic experiments further demonstrate superior grasp success rates in cluttered scenes, validating its efficiency, practicality, and generalization capability.
📝 Abstract
Grasp pose detection (GPD) is a fundamental capability for robotic autonomy, but its reliance on large, diverse datasets creates significant data privacy and centralization challenges. Federated Learning (FL) offers a privacy-preserving solution, but its application to GPD is hindered by the substantial communication overhead of large models, a key issue for resource-constrained robots. To address this, we propose a novel module-wise FL framework that begins by analyzing the learning dynamics of the GPD model's functional components. This analysis identifies slower-converging modules, to which our framework then allocates additional communication effort. This is realized through a two-phase process: a standard full-model training phase is followed by a communication-efficient phase where only the identified subset of slower-converging modules is trained and their partial updates are aggregated. Extensive experiments on the GraspNet-1B dataset demonstrate that our method outperforms standard FedAvg and other baselines, achieving higher accuracy for a given communication budget. Furthermore, real-world experiments on a physical robot validate our approach, showing a superior grasp success rate compared to baseline methods in cluttered scenes. Our work presents a communication-efficient framework for training robust, generalized GPD models in a decentralized manner, effectively improving the trade-off between communication cost and model performance.