🤖 AI Summary
To address the quadratic computational complexity of Transformer Neural Processes (TNPs) arising from self-attention, this work proposes a pseudo-token compression framework coupled with an inductive-set attention mechanism. By introducing learnable pseudo-token representations and a novel inductive-set attention module, the method reduces contextual modeling complexity from quadratic to linear, enabling explicit, tunable trade-offs between accuracy and efficiency. Integrated within a variational inference and meta-learning framework, the approach achieves state-of-the-art or superior performance on diverse tasks—including 1D regression, image completion, contextual bandits, and Bayesian optimization. Crucially, it delivers substantial gains in training and inference efficiency at scale, enabling the first scalable deployment of TNPs under controllable computational overhead and thereby overcoming their practical scalability bottleneck.
📝 Abstract
Neural Processes (NPs) have gained attention in meta-learning for their ability to quantify uncertainty, together with their rapid prediction and adaptability. However, traditional NPs are prone to underfitting. Transformer Neural Processes (TNPs) significantly outperform existing NPs, yet their applicability in real-world scenarios is hindered by their quadratic computational complexity relative to both context and target data points. To address this, pseudo-token-based TNPs (PT-TNPs) have emerged as a novel NPs subset that condense context data into latent vectors or pseudo-tokens, reducing computational demands. We introduce the Induced Set Attentive Neural Processes (ISANPs), employing Induced Set Attention and an innovative query phase to improve querying efficiency. Our evaluations show that ISANPs perform competitively with TNPs and often surpass state-of-the-art models in 1D regression, image completion, contextual bandits, and Bayesian optimization. Crucially, ISANPs offer a tunable balance between performance and computational complexity, which scale well to larger datasets where TNPs face limitations.