Self-Guided Plan Extraction for Instruction-Following Tasks with Goal-Conditional Reinforcement Learning

📅 2026-04-22

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This work addresses the reliance on human-defined subtasks and annotated data in instruction-following tasks by proposing SuperIgor, a novel framework that enables the first co-learning paradigm between a language model and a reinforcement learning agent without any predefined subtasks. SuperIgor leverages the language model to autonomously generate high-level plans, which are executed by a goal-conditioned reinforcement learning agent, while preference-based feedback drives iterative refinement of these plans, establishing a closed-loop co-training mechanism. Experimental results demonstrate that SuperIgor substantially reduces dependence on human annotations, adheres more faithfully to instructions in complex dynamic environments, and exhibits strong generalization capabilities on unseen instructions.

Technology Category

Application Category

📝 Abstract

We introduce SuperIgor, a framework for instruction-following tasks. Unlike prior methods that rely on predefined subtasks, SuperIgor enables a language model to generate and refine high-level plans through a self-learning mechanism, reducing the need for manual dataset annotation. Our approach involves iterative co-training: an RL agent is trained to follow the generated plans, while the language model adapts and modifies these plans based on RL feedback and preferences. This creates a feedback loop where both the agent and the planner improve jointly. We validate our framework in environments with rich dynamics and stochasticity. Results show that SuperIgor agents adhere to instructions more strictly than baseline methods, while also demonstrating strong generalization to previously unseen instructions.

Problem

Research questions and friction points this paper is trying to address.

instruction-following

plan extraction

goal-conditional reinforcement learning

self-guided learning

generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

self-guided planning

goal-conditional reinforcement learning

instruction-following