TVCACHE: A Stateful Tool-Value Cache for Post-Training LLM Agents

📅 2026-02-11

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This work addresses the inefficiency in large language model (LLM) agent training, where frequent external tool invocations during reinforcement learning lead to GPU underutilization and soaring computational costs. Conventional caching strategies, which disregard environmental state dependencies, often produce incorrect results upon reuse. To resolve this, the authors propose a stateful tool-value caching mechanism that records historical tool call sequences in a tree structure and employs longest-prefix matching to ensure result reuse only when the full environmental context is identical. This approach enables the first state-aware cache reuse, achieving up to 70% cache hit rates across terminal interaction, SQL generation, and video understanding tasks. It reduces the median tool execution time by as much as 6.9× without compromising accumulated reward performance.

Technology Category

Application Category

📝 Abstract

In RL post-training of LLM agents, calls to external tools take several seconds or even minutes, leaving allocated GPUs idle and inflating post-training time and cost. While many tool invocations repeat across parallel rollouts and could in principle be cached, naively caching their outputs for reuse is incorrect since tool outputs depend on the environment state induced by prior agent interactions. We present TVCACHE, a stateful tool-value cache for LLM agent post-training. TVCACHE maintains a tree of observed tool-call sequences and performs longest-prefix matching for cache lookups: a hit occurs only when the agent's full tool history matches a previously executed sequence, guaranteeing identical environment state. On three diverse workloads-terminal-based tasks, SQL generation, and video understanding. TVCACHE achieves cache hit rates of up to 70% and reduces median tool call execution time by up to 6.9X, with no degradation in post-training reward accumulation.

Problem

Research questions and friction points this paper is trying to address.

LLM agent

post-training

tool caching

environment state

GPU idle

Innovation

Methods, ideas, or system contributions that make the work stand out.

stateful caching

tool-value cache

LLM agent post-training