🤖 AI Summary
Humanoid robots face significant challenges in opening unknown doors in real-world environments, primarily due to long-horizon decision-making and partial observability—specifically, the invisibility of door lock states, which induces ambiguity in action sequences (e.g., whether to rotate the handle or push directly) and leads to mode collapse in behavior cloning. To address this, we propose a phase-conditioned imitation learning framework that decomposes the task into perception, decision-making, and execution phases. Explicit phase labels serve as auxiliary inputs to the policy, mitigating uncertainty under partial observability and enabling phase-guided behavioral recovery. Our method jointly encodes low-level motor control and high-level phase semantics, achieving end-to-end door-opening on a physical humanoid robot. Experiments demonstrate a 55% success rate on unseen office doors—over 100% higher than the best baseline—while reducing execution time, validating robustness and practical applicability.
📝 Abstract
Humanoid robots promise to operate in everyday human environments without requiring modifications to the surroundings. Among the many skills needed, opening doors is essential, as doors are the most common gateways in built spaces and often limit where a robot can go. Door opening, however, poses unique challenges as it is a long-horizon task under partial observability, such as reasoning about the door's unobservable latch state that dictates whether the robot should rotate the handle or push the door. This ambiguity makes standard behavior cloning prone to mode collapse, yielding blended or out-of-sequence actions. We introduce StageACT, a stage-conditioned imitation learning framework that augments low-level policies with task-stage inputs. This effective addition increases robustness to partial observability, leading to higher success rates and shorter completion times. On a humanoid operating in a real-world office environment, StageACT achieves a 55% success rate on previously unseen doors, more than doubling the best baseline. Moreover, our method supports intentional behavior guidance through stage prompting, enabling recovery behaviors. These results highlight stage conditioning as a lightweight yet powerful mechanism for long-horizon humanoid loco-manipulation.