Self-Guided Plan Extraction for Instruction-Following Tasks with Goal-Conditional Reinforcement Learning

📅 2026-04-22
📈 Citations: 0
Influential: 0
📄 PDF

career value

191K/year
🤖 AI Summary
This work addresses the reliance on human-defined subtasks and annotated data in instruction-following tasks by proposing SuperIgor, a novel framework that enables the first co-learning paradigm between a language model and a reinforcement learning agent without any predefined subtasks. SuperIgor leverages the language model to autonomously generate high-level plans, which are executed by a goal-conditioned reinforcement learning agent, while preference-based feedback drives iterative refinement of these plans, establishing a closed-loop co-training mechanism. Experimental results demonstrate that SuperIgor substantially reduces dependence on human annotations, adheres more faithfully to instructions in complex dynamic environments, and exhibits strong generalization capabilities on unseen instructions.

Technology Category

Application Category

📝 Abstract
We introduce SuperIgor, a framework for instruction-following tasks. Unlike prior methods that rely on predefined subtasks, SuperIgor enables a language model to generate and refine high-level plans through a self-learning mechanism, reducing the need for manual dataset annotation. Our approach involves iterative co-training: an RL agent is trained to follow the generated plans, while the language model adapts and modifies these plans based on RL feedback and preferences. This creates a feedback loop where both the agent and the planner improve jointly. We validate our framework in environments with rich dynamics and stochasticity. Results show that SuperIgor agents adhere to instructions more strictly than baseline methods, while also demonstrating strong generalization to previously unseen instructions.
Problem

Research questions and friction points this paper is trying to address.

instruction-following
plan extraction
goal-conditional reinforcement learning
self-guided learning
generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

self-guided planning
goal-conditional reinforcement learning
instruction-following
iterative co-training
language model adaptation
🔎 Similar Papers
No similar papers found.