Continuous Prompts: LLM-Augmented Pipeline Processing over Unstructured Streams

📅 2025-12-03

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

Existing LLM frameworks are stateless and execute queries in isolation, making them ill-suited for long-horizon, semantics-aware analysis of unstructured data streams. Method: This paper introduces the first LLM-driven continuous stream processing paradigm, extending Retrieval-Augmented Generation (RAG) to streaming settings and formalizing *continuous semantic operators*. We design a dynamic optimization framework integrating lightweight shadow execution with multi-objective Bayesian optimization (MOBO) to adaptively balance throughput and accuracy. Further, we incorporate tuple batching, operator fusion, and cost-aware scheduling within the VectraFlow system. Results: Experiments demonstrate that VectraFlow responds to load fluctuations in real time, sustains high-accuracy continuous semantic querying under high throughput, achieves significant efficiency gains, and incurs only bounded, controllable accuracy degradation.

Technology Category

Application Category

📝 Abstract

Monitoring unstructured streams increasingly requires persistent, semantics-aware computation, yet today's LLM frameworks remain stateless and one-shot, limiting their usefulness for long-running analytics. We introduce Continuous Prompts (CPs), the first framework that brings LLM reasoning into continuous stream processing. CPs extend RAG to streaming settings, define continuous semantic operators, and provide multiple implementations, primarily focusing on LLM-based approaches but also reporting one embedding-based variants. Furthermore, we study two LLM-centric optimizations, tuple batching and operator fusion, to significantly improve efficiency while managing accuracy loss. Because these optimizations inherently trade accuracy for speed, we present a dynamic optimization framework that uses lightweight shadow executions and cost-aware multi-objective Bayesian optimization (MOBO) to learn throughput-accuracy frontiers and adapt plans under probing budgets. We implement CPs in the VectraFlow stream processing system. Using operator-level microbenchmarks and streaming pipelines on real datasets, we show that VectraFlow can adapt to workload dynamics, navigate accuracy-efficiency trade-offs, and sustain persistent semantic queries over evolving unstructured streams.

Problem

Research questions and friction points this paper is trying to address.

Enables LLM reasoning for continuous unstructured stream processing

Introduces optimizations balancing accuracy and efficiency in stream analytics

Supports persistent semantic queries over evolving unstructured data streams

Innovation

Methods, ideas, or system contributions that make the work stand out.

Continuous Prompts enable LLM reasoning in stream processing

Extends RAG to streaming with semantic operators and implementations

Uses dynamic optimization with MOBO for accuracy-efficiency trade-offs

🔎 Similar Papers

ToolGen: Unified Tool Retrieval and Calling via Generation