Unsupervised Skill Discovery as Exploration for Learning Agile Locomotion

📅 2025-08-12

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Legged robots typically rely on hand-crafted reward functions and expert demonstrations to acquire agile locomotion skills, limiting autonomous skill acquisition in complex, cluttered environments. Method: We propose Skill Discovery as Exploration (SDAX), a framework that integrates unsupervised skill discovery with a bilevel optimization mechanism to dynamically modulate exploration intensity—enabling autonomous, reward-free emergence of diverse locomotion skills. Contribution/Results: Unlike conventional reinforcement learning, SDAX eliminates dependence on predefined task objectives or demonstration data. In simulation, it efficiently acquires challenging agile behaviors—including crawling, climbing, jumping, and vertical wall takeoffs—without human supervision. Crucially, these skills generalize seamlessly to a real-world quadrupedal robot platform, demonstrating significantly enhanced cross-environment adaptability and generalization capability.

Technology Category

Application Category

📝 Abstract

Exploration is crucial for enabling legged robots to learn agile locomotion behaviors that can overcome diverse obstacles. However, such exploration is inherently challenging, and we often rely on extensive reward engineering, expert demonstrations, or curriculum learning - all of which limit generalizability. In this work, we propose Skill Discovery as Exploration (SDAX), a novel learning framework that significantly reduces human engineering effort. SDAX leverages unsupervised skill discovery to autonomously acquire a diverse repertoire of skills for overcoming obstacles. To dynamically regulate the level of exploration during training, SDAX employs a bi-level optimization process that autonomously adjusts the degree of exploration. We demonstrate that SDAX enables quadrupedal robots to acquire highly agile behaviors including crawling, climbing, leaping, and executing complex maneuvers such as jumping off vertical walls. Finally, we deploy the learned policy on real hardware, validating its successful transfer to the real world.

Problem

Research questions and friction points this paper is trying to address.

Enabling legged robots to learn agile locomotion autonomously

Reducing reliance on human engineering for skill discovery

Dynamically adjusting exploration levels during robot training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unsupervised skill discovery for agile locomotion

Bi-level optimization for dynamic exploration

Real-world deployment of learned policies

🔎 Similar Papers

No similar papers found.

Authors to Follow