Autonomous Aerial Manipulation via Contextual Contrastive Meta Reinforcement Learning

📅 2026-06-07

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of enabling drones to autonomously grasp, transport, and deliver diverse payloads without pre-installed fixtures or human intervention. To this end, the authors propose the Aco² framework, which leverages contextual contrastive meta-reinforcement learning for end-to-end aerial manipulation. The approach introduces a context observation encoder that implicitly extracts dynamic payload characteristics through a contrastive learning objective, thereby achieving generalization to unseen payloads without explicit system identification. By integrating meta-reinforcement learning, contrastive representation learning, and domain randomization, the policy is trained entirely in simulation and deployed directly on a real quadrotor platform. Experimental results demonstrate successful fully autonomous pick-and-place operations on a variety of handled objects, validating the framework’s robustness and transferability.

📝 Abstract

Unmanned aerial vehicles (UAVs) are increasingly being deployed in logistics, service robotics, and other real-world applications, creating a growing demand for autonomous payload acquisition and delivery. Existing approaches typically assume pre-attached payloads or rely on specialized grippers, leaving versatile end-to-end aerial delivery largely unresolved, where different payloads induce highly variable flight dynamics, requiring a single policy to adapt online without manual calibration or explicit system identification. To this end, we study \textbf{A}utonomous \textbf{A}erial Manipulation via \textbf{Co}ntextual \textbf{Co}ntrastive Meta Reinforcement Learning (\textbf{\textit{Aco2}}), a fully autonomous aerial delivery setting in which a quadrotor equipped with a lightweight hook continuously picks up, transports, and delivers diverse handle-equipped objects between randomized locations, all without human intervention. First, we design a contextual observation encoder that infers a compact latent context from recent interaction history, enabling the policy to adapt online to payload-dependent dynamics. To further improve the quality of this context, we introduce a contrastive objective that structures the context embedding around task-relevant variations, improving generalization across diverse payloads without requiring explicit system identification. Trained entirely in simulation with extensive domain randomization, \textit{Aco2} can be directly deployed on a physical quadrotor without real-world fine-tuning.

Problem

Research questions and friction points this paper is trying to address.

Autonomous Aerial Manipulation

Payload Adaptation

End-to-End Aerial Delivery

Variable Flight Dynamics

Online Adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Contextual Contrastive Learning

Meta Reinforcement Learning

Autonomous Aerial Manipulation