Agent-Omit: Training Efficient LLM Agents for Adaptive Thought and Observation Omission via Agentic Reinforcement Learning

📅 2026-02-04

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This work addresses the inefficiency of existing large language model (LLM) agents, which treat all reasoning steps and observations uniformly across multi-turn interactions despite their varying utility. To overcome this limitation, we propose Agent-Omit, a novel framework that, for the first time, quantifies the impact of individual thoughts and observations on agent performance and introduces an adaptive omission strategy with provably bounded bias. Our approach integrates small-scale cold-start fine-tuning, omission-aware reinforcement learning, a dual-sampling mechanism, and a tailored reward function. Experimental results demonstrate that Agent-Omit-8B achieves state-of-the-art performance across five benchmarks and consistently outperforms seven state-of-the-art efficient LLM agent methods in terms of the trade-off between efficiency and effectiveness.

Technology Category

Application Category

📝 Abstract

Managing agent thought and observation during multi-turn agent-environment interactions is an emerging strategy to improve agent efficiency. However, existing studies treat the entire interaction trajectories equally, overlooking the thought necessity and observation utility varies across turns. To this end, we first conduct quantitative investigations into how thought and observation affect agent effectiveness and efficiency. Based on our findings, we propose Agent-Omit, a unified training framework that empowers LLM agents to adaptively omit redundant thoughts and observations. Specifically, we first synthesize a small amount of cold-start data, including both single-turn and multi-turn omission scenarios, to fine-tune the agent for omission behaviors. Furthermore, we introduce an omit-aware agentic reinforcement learning approach, incorporating a dual sampling mechanism and a tailored omission reward to incentivize the agent's adaptive omission capability. Theoretically, we prove that the deviation of our omission policy is upper-bounded by KL-divergence. Experimental results on five agent benchmarks show that our constructed Agent-Omit-8B could obtain performance comparable to seven frontier LLM agent, and achieve the best effectiveness-efficiency trade-off than seven efficient LLM agents methods. Our code and data are available at https://github.com/usail-hkust/Agent-Omit.

Problem

Research questions and friction points this paper is trying to address.

agent efficiency

thought omission

observation utility

multi-turn interaction

LLM agents

Innovation

Methods, ideas, or system contributions that make the work stand out.

adaptive omission

agentic reinforcement learning

thought pruning