DisCEdge: Distributed Context Management for Large Language Models at the Edge

📅 2025-11-27

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address high network latency, bandwidth overhead, and data consistency challenges in managing user context (e.g., conversations, preferences) for LLM services in edge computing, this paper proposes a token-sequence-based distributed context storage and synchronization mechanism. The method tokenizes context into lightweight sequences, enabling efficient replication and incremental synchronization across edge nodes—thereby eliminating redundant computation and transmission. Implemented atop open-source infrastructure on commodity hardware, the system ensures practical deployability. Experimental evaluation demonstrates that, compared to raw-text baselines, the approach achieves up to a 14.46% reduction in median response latency, a 15% decrease in inter-node synchronization overhead, and a 90% reduction in client request payload size—effectively balancing low latency, low resource consumption, and strong consistency.

Technology Category

Application Category

📝 Abstract

Deploying Large Language Model (LLM) services at the edge benefits latency-sensitive and privacy-aware applications. However, the stateless nature of LLMs makes managing user context (e.g., sessions, preferences) across geo-distributed edge nodes challenging. Existing solutions, such as client-side context storage, often introduce network latency and bandwidth overhead, undermining the advantages of edge deployment. We propose DisCEdge, a distributed context management system that stores and replicates user context in tokenized form across edge nodes. By maintaining context as token sequences rather than raw text, our system avoids redundant computation and enables efficient data replication. We implement and evaluate an open-source prototype in a realistic edge environment with commodity hardware. We show DisCEdge improves median response times by up to 14.46% and lowers median inter-node synchronization overhead by up to 15% compared to a raw-text-based system. It also reduces client request sizes by a median of 90% compared to client-side context management, while guaranteeing data consistency.

Problem

Research questions and friction points this paper is trying to address.

Manages user context across distributed edge nodes for LLMs

Reduces network latency and bandwidth overhead in edge deployments

Ensures data consistency while minimizing client request sizes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Stores user context as token sequences across edge nodes

Reduces redundant computation and enables efficient data replication

Improves response times and lowers synchronization overhead significantly

🔎 Similar Papers

No similar papers found.

Authors to Follow