Towards Sampling Data Structures for Tensor Products in Turnstile Streams

📅 2025-10-04

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the high computational overhead and poor scalability of large-scale attention models in streaming settings, this paper proposes the first importance sampling framework for attention tailored to streaming scenarios. Drawing inspiration from ℓ₂ sampling, it formulates attention as a tensor-product stream and designs an efficient data structure under the turnstile streaming model, enabling sublinear space usage and near-real-time updates. Theoretically, the method achieves O(1/ε²) space complexity and O(log n) per-update time—significantly outperforming full attention. Empirically, the framework demonstrates strong generalization and scalability across diverse architectures—including Transformer and Longformer—and tasks spanning text and time-series domains. This work establishes a novel paradigm for efficient streaming inference in large language models.

Technology Category

Application Category

📝 Abstract

This paper studies the computational challenges of large-scale attention-based models in artificial intelligence by utilizing importance sampling methods in the streaming setting. Inspired by the classical definition of the $ell_2$ sampler and the recent progress of the attention scheme in Large Language Models (LLMs), we propose the definition of the attention sampler. Our approach significantly reduces the computational burden of traditional attention mechanisms. We analyze the effectiveness of the attention sampler from a theoretical perspective, including space and update time. Additionally, our framework exhibits scalability and broad applicability across various model architectures and domains.

Problem

Research questions and friction points this paper is trying to address.

Reducing computational burden in large-scale attention models

Proposing attention sampler for tensor product streams

Analyzing space and time efficiency theoretically

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes attention sampler definition for tensor products

Reduces computational burden of attention mechanisms

Analyzes space and update time theoretically

🔎 Similar Papers

No similar papers found.

Authors to Follow