HYVE: Hybrid Views for LLM Context Engineering over Machine Data

πŸ“… 2026-04-06
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the poor performance of large language models (LLMs) when processing lengthy, nested, and structurally repetitive machine dataβ€”such as logs, metrics, and telemetry traces. The authors propose HYVE, a novel framework that introduces database-style views into LLM context engineering for the first time. By leveraging request-scoped data storage and hybrid row-column views, HYVE orchestrates preprocessing and postprocessing around model invocation to inject only the most relevant information representations. This approach effectively compresses input size while preserving semantic fidelity, thereby approximating an expanded context window. Experimental results across diverse real-world tasks demonstrate that HYVE reduces token consumption by 50–90%, improves chart-generation accuracy by up to 132%, and lowers latency by as much as 83%, all while maintaining or enhancing output quality.
πŸ“ Abstract
Machine data is central to observability and diagnosis in modern computing systems, appearing in logs, metrics, telemetry traces, and configuration snapshots. When provided to large language models (LLMs), this data typically arrives as a mixture of natural language and structured payloads such as JSON or Python/AST literals. Yet LLMs remain brittle on such inputs, particularly when they are long, deeply nested, and dominated by repetitive structure. We present HYVE (HYbrid ViEw), a framework for LLM context engineering for inputs containing large machine-data payloads, inspired by database management principles. HYVE surrounds model invocation with coordinated preprocessing and postprocessing, centered on a request-scoped datastore augmented with schema information. During preprocessing, HYVE detects repetitive structure in raw inputs, materializes it in the datastore, transforms it into hybrid columnar and row-oriented views, and selectively exposes only the most relevant representation to the LLM. During postprocessing, HYVE either returns the model output directly, queries the datastore to recover omitted information, or performs a bounded additional LLM call for SQL-augmented semantic synthesis. We evaluate HYVE on diverse real-world workloads spanning knowledge QA, chart generation, anomaly detection, and multi-step network troubleshooting. Across these benchmarks, HYVE reduces token usage by 50-90% while maintaining or improving output quality. On structured generation tasks, it improves chart-generation accuracy by up to 132% and reduces latency by up to 83%. Overall, HYVE offers a practical approximation to an effectively unbounded context window for prompts dominated by large machine-data payloads.
Problem

Research questions and friction points this paper is trying to address.

machine data
large language models
context engineering
structured payloads
repetitive structure
Innovation

Methods, ideas, or system contributions that make the work stand out.

HYVE
context engineering
machine data
hybrid views
LLM optimization
πŸ”Ž Similar Papers
No similar papers found.
Jian Tan
Jian Tan
Alibaba Group, previously Tenure-Track Faculty with The Ohio State University
Intelligent DatabaseStochastic operations researchMachine LearningLarge-scale optimizationDistributed Computing Systems
F
Fan Bu
Cisco Systems, Inc., San Jose, California, USA
Y
Yuqing Gao
Cisco Systems, Inc., San Jose, California, USA
D
Dev Khanolkar
Cisco Systems, Inc., San Jose, California, USA
J
Jason Mackay
Cisco Systems, Inc., San Jose, California, USA
B
Boris Sobolev
Cisco Systems, Inc., San Jose, California, USA
L
Lei Jin
Cisco Systems, Inc., San Jose, California, USA
L
Li Zhang
Cisco Systems, Inc., San Jose, California, USA