Functional Cache Grafting: Robust and Rapid Code-Policy Synthesis for Embodied Agents

📅 2026-06-11

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the high latency and reliability issues in existing code-based policy generation methods for open-domain embodied environments, where redundant prefill computations and fully generative decoding often lead to API mismatches, insufficient safety guarantees, and unstable control logic. To overcome these limitations, the authors propose a function-level cache grafting mechanism that constructs a library of verified code skeletons along with their corresponding Transformer key-value caches. Upon encountering a new task, the system retrieves relevant functions from this library, reuses their cached representations, and synthesizes an efficient policy through cache stitching and localized patching. This approach eliminates redundant computation, leverages proven control structures, and maintains policy robustness while significantly improving efficiency—achieving an 18.31% higher task success rate and 2.3× faster generation compared to prompt-level caching baselines such as RAGCache.

📝 Abstract

Code-writing large language models (CodeLLMs) generate executable code policies for embodied agents by translating natural language goals and environmental constraints into structured control programs. However, policy generation in open-domain embodied environments suffers from two fundamental limitations: (i) delayed decoding caused by repetitive prefill computation over long prompts, and (ii) limited robustness due to fully generative decoding, which often produces API mismatches, missing safety guards, and unstable control logic. To address these limitations, we present FCGraft, a Functional Cache Grafting framework. FCGraft maintains a library of function-level validated code skeletons and their associated prompt-level Transformer key-value (KV) caches, and synthesizes new policies by retrieving relevant functions and grafting their KV caches when a new task is provided. Given retrieved function caches, FCGraft performs cache grafting via stitching, which composes cached function segments into a composite policy, and patching, which locally adapts only the necessary code regions to satisfy task-specific parameters and constraints with minimal additional decoding. By eliminating redundant prefill computation, this approach reduces generation latency, while reusing validated control structures improves robustness over prompt-level caching methods RAGCache, achieving 18.31% higher task success rate and 2.3x faster policy synthesis.

Problem

Research questions and friction points this paper is trying to address.

embodied agents

code-policy synthesis

generation latency

robustness

prefill computation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Functional Cache Grafting

KV cache reuse

code policy synthesis