ProactiveLLM: Learning Active Interaction for Streaming Large Language Models

📅 2026-05-30

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Standard large language models follow a “read-then-generate” paradigm, incurring redundant latency, while existing streaming models struggle to autonomously determine optimal interaction timing. This work proposes ProactiveLLM, a novel framework that endows models with an intrinsic semantic sufficiency awareness mechanism, enabling them to proactively decide—based on partial input—whether to continue waiting or respond immediately. The approach integrates masked streaming modeling with Synchronous Privileged Self-Distillation (SPSD), achieving plug-and-play proactive interaction capability without relying on external alignment signals or annotated data, solely leveraging the model’s internal states. Experiments demonstrate that ProactiveLLM significantly reduces interaction latency in both text and speech streaming tasks while preserving generation quality, thereby validating the efficacy of dynamic proactive interaction.

📝 Abstract

Standard Large Language Models (LLMs) follow a read-then-generate paradigm, causing unnecessary latency and computation. Streaming LLMs alleviate this issue by generating while receiving inputs, but still struggle to decide when to interact with the stream. Existing methods either hard-code interaction timing or rely on costly external alignment signals, such as timing labels, reasoning trajectories, or stronger teachers. In this paper, we propose ProactiveLLM, which achieves active interaction by leveraging the model's endogenous states to guide interaction decisions. The model first learns to perceive semantic sufficiency from partial inputs through two complementary training mechanisms: mask-based streaming modeling and synchronized privileged self-distillation (SPSD). The former applies monotonic random masking to the input during training, simulating progressively revealed streaming inputs and enabling the model to learn local semantic dependencies from partial-input views. The latter aligns the partial-context student view with a full-context teacher view generated by the same evolving model, allowing privileged full-context evidence to guide the student's understanding under incomplete observations. Together, these mechanisms induce endogenous sufficiency cues without requiring external teachers or annotations, providing a versatile foundation for the plug-and-play integration of diverse decision heads. Extensive evaluation across text and speech streaming tasks confirms that ProactiveLLM significantly reduces interaction latency while maintaining quality, validating its capacity for dynamic and active interaction. Code is publicly available at https://github.com/EIT-NLP/StreamingLLM/tree/main/ProactiveLLM.

Problem

Research questions and friction points this paper is trying to address.

streaming LLMs

active interaction

interaction latency

semantic sufficiency

endogenous decision-making

Innovation

Methods, ideas, or system contributions that make the work stand out.

ProactiveLLM

streaming LLM

endogenous interaction