🤖 AI Summary
This work addresses the common limitation of large language model (LLM) agents in lacking proactive awareness of computational budgets, which hinders their ability to dynamically manage resource consumption during task execution. The authors propose BAGEN, a budget-aware agent that formalizes internal and external budget concepts for the first time, framing budget estimation as a sequential interval prediction problem. BAGEN is trained via a combination of supervised fine-tuning and reinforcement learning to enable early stopping and budget-alert capabilities. Experimental results across four environments demonstrate that BAGEN reduces token consumption on failed trajectories by 28%–64% and significantly improves alert accuracy. However, the observed upper bound of 47% interval coverage reveals a pervasive over-optimism bias in state-of-the-art models, highlighting a critical challenge in calibration for LLM-based agents.
📝 Abstract
While agents are increasingly spending more resources, today agent cost is mostly measured only after execution. A Budget-Aware Agent (BAGEN) should treat budget as an active control signal, rather than a passive cost metric. We first systematically define budget estimation as internal budgets (from agent computation) and external budgets (from agent actions). We then formalize budget-awareness as progressive interval estimation: at each step of a plan, an agent should predict an upper and lower bound on remaining budget, and alert when completion is unlikely. Scoring with a rollout-replay protocol, we find consistent failure patterns on four environments and five frontier agents: (1) strong agents do not necessarily have strong budget-awareness, with correlation r=0.35. (2) frontier models are consistently over-optimistic, continue spending on tasks that are unlikely to succeed, instead of alerting the user early. (3) budget-aware signal is actionable and trainable. Early stop saves 28-64% tokens on failed trajectories, and SFT+RL strengthens early stop and alert behavior. (4) precise interval calibration remains challenging, with interval coverage capping at 47% after SFT+RL. Project page: https://ragen-ai.github.io/bagen/