SCOPE: Language Models as One-Time Teacher for Hierarchical Planning in Text Environments

📅 2025-12-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Long-horizon hierarchical planning in text-based environments faces challenges including open-ended action spaces, ambiguous observations, and sparse rewards; existing LLM-dependent approaches suffer from high inference overhead, non-differentiable parameters, and inefficient deployment. Method: We propose the “One-Shot Teacher” paradigm: an LLM is invoked only once at planning initialization to generate a subgoal sequence, followed by LLM-guided trajectory distillation to pretrain a lightweight student planner (e.g., a Transformer-based planner) for subgoal-conditioned modeling. Contribution/Results: This eliminates repeated LLM calls during both training and inference. On TextCraft, our method achieves 56% success rate—surpassing ADaPT’s 52%—while reducing average inference time from 164.4 seconds to 3.0 seconds (54× speedup), significantly improving both task performance and deployment efficiency.

Technology Category

Application Category

📝 Abstract
Long-term planning in complex, text-based environments presents significant challenges due to open-ended action spaces, ambiguous observations, and sparse feedback. Recent research suggests that large language models (LLMs) encode rich semantic knowledge about the world, which can be valuable for guiding agents in high-level reasoning and planning across both embodied and purely textual settings. However, existing approaches often depend heavily on querying LLMs during training and inference, making them computationally expensive and difficult to deploy efficiently. In addition, these methods typically employ a pretrained, unaltered LLM whose parameters remain fixed throughout training, providing no opportunity for adaptation to the target task. To address these limitations, we introduce SCOPE (Subgoal-COnditioned Pretraining for Efficient planning), a one-shot hierarchical planner that leverages LLM-generated subgoals only at initialization to pretrain a lightweight student model. Unlike prior approaches that distill LLM knowledge by repeatedly prompting the model to adaptively generate subgoals during training, our method derives subgoals directly from example trajectories. This design removes the need for repeated LLM queries, significantly improving efficiency, though at the cost of reduced explainability and potentially suboptimal subgoals. Despite their suboptimality, our results on the TextCraft environment show that LLM-generated subgoals can still serve as a strong starting point for hierarchical goal decomposition in text-based planning tasks. Compared to the LLM-based hierarchical agent ADaPT (Prasad et al., 2024), which achieves a 0.52 success rate, our method reaches 0.56 and reduces inference time from 164.4 seconds to just 3.0 seconds.
Problem

Research questions and friction points this paper is trying to address.

Addresses high computational cost of LLM-based hierarchical planning
Eliminates need for repeated LLM queries during training and inference
Uses one-shot LLM guidance to pretrain a lightweight student model
Innovation

Methods, ideas, or system contributions that make the work stand out.

One-shot hierarchical planner using LLM-generated subgoals
Pretrains lightweight student model from example trajectories
Eliminates repeated LLM queries for efficiency gains