Keyframe-Guided Structured Rewards for Reinforcement Learning in Long-Horizon Laboratory Robotics

📅 2026-02-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of low exploration efficiency and unstable convergence in long-horizon, high-precision laboratory robotic tasks, which arise from sparse rewards, multi-stage constraints, and noisy demonstrations. To overcome these issues, the authors propose a keyframe-guided structured reward generation framework that automatically extracts kinematics-aware keyframes from demonstrations. Leveraging a diffusion model in latent space, the method predicts stage-wise goals and constructs an explicit reward mechanism embedding task logic through geometric progress metrics. The system integrates multi-view visual encoding, a vision-language-action backbone, and human-in-the-loop reinforcement fine-tuning. Evaluated on four real-world biological experimentation tasks, the approach achieves an average success rate of 82% with only 40–60 minutes of online fine-tuning, substantially outperforming HG-DAgger (42%) and Hil-ConRFT (47%).

Technology Category

Application Category

📝 Abstract
Long-horizon precision manipulation in laboratory automation, such as pipette tip attachment and liquid transfer, requires policies that respect strict procedural logic while operating in continuous, high-dimensional state spaces. However, existing approaches struggle with reward sparsity, multi-stage structural constraints, and noisy or imperfect demonstrations, leading to inefficient exploration and unstable convergence. We propose a Keyframe-Guided Reward Generation Framework that automatically extracts kinematics-aware keyframes from demonstrations, generates stage-wise targets via a diffusion-based predictor in latent space, and constructs a geometric progress-based reward to guide online reinforcement learning. The framework integrates multi-view visual encoding, latent similarity-based progress tracking, and human-in-the-loop reinforcement fine-tuning on a Vision-Language-Action backbone to align policy optimization with the intrinsic stepwise logic of biological protocols. Across four real-world laboratory tasks, including high-precision pipette attachment and dynamic liquid transfer, our method achieves an average success rate of 82% after 40--60 minutes of online fine-tuning. Compared with HG-DAgger (42%) and Hil-ConRFT (47%), our approach demonstrates the effectiveness of structured keyframe-guided rewards in overcoming exploration bottlenecks and providing a scalable solution for high-precision, long-horizon robotic laboratory automation.
Problem

Research questions and friction points this paper is trying to address.

long-horizon manipulation
reward sparsity
structured constraints
laboratory robotics
demonstration noise
Innovation

Methods, ideas, or system contributions that make the work stand out.

Keyframe-Guided Rewards
Diffusion-Based Predictor
Latent Space Progress Tracking
Vision-Language-Action Backbone
Long-Horizon Robotic Manipulation
Y
Yibo Qiu
Suzhou Institute for Advanced Research, University of Science and Technology of China; School of Biomedical Engineering, Division of Life Sciences and Medicine, University of Science and Technology of China
S
Shu'ang Sun
Suzhou Institute for Advanced Research, University of Science and Technology of China; School of Biomedical Engineering, Division of Life Sciences and Medicine, University of Science and Technology of China
H
Haoliang Ye
Suzhou Institute for Advanced Research, University of Science and Technology of China; School of Biomedical Engineering, Division of Life Sciences and Medicine, University of Science and Technology of China
R
Ronald X Xu
Suzhou Institute for Advanced Research, University of Science and Technology of China; School of Biomedical Engineering, Division of Life Sciences and Medicine, University of Science and Technology of China
Mingzhai Sun
Mingzhai Sun
University of Science and Technology of China
Biomedical Engineeringdeep learningretinal imaging