🤖 AI Summary
This work addresses key challenges in long-form dialogue modeling—namely redundant computation, context loss, and error propagation—by introducing a context-driven incremental compression mechanism. The proposed approach parses dialogues into updatable threads and dynamically maintains and shares cross-turn information within a compact memory through a lightweight retrieve-revise-writeback loop. To effectively capture long-range dependencies, the method integrates truncated backpropagation through time (TBPTT) during training. Experimental results demonstrate that the model achieves superior performance on long-dialogue benchmarks while maintaining consistently low inference latency and stable perplexity across conversations spanning hundreds of turns, significantly enhancing both stability and computational efficiency.
📝 Abstract
Modern conversational agents condition on an ever-growing dialogue history at each turn, incurring redundant attention and encoding costs that grow with conversation length. Naive truncation or summarization degrades fidelity, while existing context compressors lack cross-turn memory sharing or revision, causing information loss and compounding errors in long dialogues. We revisit the context compression under conversational dynamics and empirically present its fragility. To improve both efficiency and robustness, we introduce Context-Driven Incremental Compression (C-DIC), which treats a conversation as interleaved contextual threads and stores revisable per-thread compression states in a single, compact dialogue memory. At each turn, a lightweight retrieve, revise, and write-back loop shares information across turns and updates stale memories, stabilizing long-horizon behavior. In addition, we adapt truncated backpropagation-through-time (TBPTT) to our multi-turn setting, learning cross-turn dependencies without full-history backpropagation. Extensive experiments on long-form dialogue benchmarks demonstrate superior performance and efficiency of C-DIC; notably, C-DIC shows stable inference latency and perplexity over hundreds of dialogue turns, supporting a scalable path to high-quality dialogue modeling.