Curriculum Design for Trajectory-Constrained Agent: Compressing Chain-of-Thought Tokens in LLMs

📅 2025-11-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of efficient learning and generalization for agents operating under stringent trajectory constraints—such as resource limitations or safety-critical requirements—during deployment. To this end, we propose a progressive curriculum learning framework. Methodologically, we introduce curriculum learning to trajectory-constrained reinforcement learning for the first time, employing a self-paced strategy that dynamically tightens constraints from loose to strict. We further integrate binary-tree MDP modeling, a multi-task navigation architecture, and LLM-based chain-of-thought (CoT) compression and inference acceleration at the token level. Experiments demonstrate substantial improvements in training efficiency and policy robustness, validate strong generalization across diverse constraint settings, and achieve significant CoT compression and inference speedup in LLM-driven reasoning.

Technology Category

Application Category

📝 Abstract
Training agents to operate under strict constraints during deployment, such as limited resource budgets or stringent safety requirements, presents significant challenges, especially when these constraints render the task complex. In this work, we propose a curriculum learning strategy that gradually tightens constraints during training, enabling the agent to incrementally master the deployment requirements. Inspired by self-paced learning techniques in unconstrained reinforcement learning (RL), our approach facilitates a smoother transition to challenging environments by initially training on simplified versions of the constraints and progressively introducing the full deployment conditions. We provide a theoretical analysis using an RL agent in a binary-tree Markov Decision Process (MDP) to demonstrate that our curriculum strategy can accelerate training relative to a baseline approach that imposes the trajectory constraints from the outset. Moreover, we empirically validate the effectiveness and generality of our method across both RL and large language model (LLM) agents in diverse settings, including a binary-tree MDP, a multi-task navigation domain, and a math reasoning task with two benchmarks. These results highlight the potential of curriculum design in enhancing the efficiency and performance of agents operating under complex trajectory constraints during deployment. Moreover, when applied to LLMs, our strategy enables compression of output chain-of-thought tokens, achieving a substantial inference speedup on consumer hardware, demonstrating its effectiveness for resource-constrained deployment.
Problem

Research questions and friction points this paper is trying to address.

Training agents to operate under strict deployment constraints efficiently
Accelerating agent adaptation to complex trajectory constraints via curriculum
Compressing chain-of-thought tokens in LLMs for faster inference
Innovation

Methods, ideas, or system contributions that make the work stand out.

Curriculum learning strategy tightens constraints gradually
Training starts simplified constraints then progresses full
Method enables chain-of-thought token compression for speedup
🔎 Similar Papers
No similar papers found.