External Experience Serving in Production LLM Systems: A Deployment-Oriented Study of Quality-Cost Trade-offs

📅 2026-06-10

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study investigates the trade-off between the quality gains from incorporating external experience into large language models and the associated online service costs—such as latency, prompt length, and computational overhead—in real-world production settings. Rather than treating external experience as a generic augmentation, the work proposes framing its use as a selective, cost-aware serving decision. Through multi-strategy comparative experiments across tasks including content moderation, tool use, and GPQA—spanning baselines without experience, random experience, global injection, and retrieval-driven selective injection—the findings reveal that retrieval quality is more critical than merely increasing top-K candidates, and that identical strategies yield markedly different performance on short-output versus long-generation tasks. The results demonstrate that external experience delivers practical value only when task characteristics and serving interfaces ensure that quality improvements outweigh online costs, with selective retrieval significantly outperforming global injection in case-sensitive scenarios.

📝 Abstract

Production LLM systems accumulate reusable operational experience, but the practical deployment issue is not merely whether such experience can help. It is how different serving strategies trade off quality against online cost under realistic constraints. Injecting external experience can improve task quality, yet it also increases prompt burden, latency, and serving pressure. We study \textit{external experience serving} as a deployment-oriented quality-cost trade-off problem. We evaluate this question in a real production moderation setting, with tool-use and GPQA as supporting contrast tasks that expose different output-cost regimes. We compare no-experience baselines, random experience controls, global prompt injection, and retrieval-based selective injection, and analyze both task quality and serving cost. The results show that, once experience becomes case-dependent, selective retrieval provides a stronger operating point than unconditional global injection. They further show that retrieval quality matters more than simply increasing Top-$K$, and that the same serving policy can exhibit substantially different cost-benefit profiles across short-output and decode-heavy regimes. These findings suggest that external experience is best treated as a selective, cost-aware serving decision rather than as a universal add-on. Overall, in the settings studied here, external experience pays off only when both the serving interface and the task-specific cost structure make its quality gains worth the online cost.

Problem

Research questions and friction points this paper is trying to address.

external experience serving

quality-cost trade-off

production LLM systems

deployment constraints

serving cost

Innovation

Methods, ideas, or system contributions that make the work stand out.

external experience serving

quality-cost trade-off

retrieval-based injection