Kalman Bayesian Transformer

📅 2025-09-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the instability and data inefficiency of sequential fine-tuning Transformers under continual streaming data and distributional shift, this paper proposes an uncertainty-aware Bayesian sequential learning framework. Methodologically, fine-tuning is formulated as posterior inference, integrating Kalman filtering with closed-form moment propagation; a Taylor approximation is applied to the softmax layer for differentiable moment estimation, while pretrained weights serve as explicit priors—enabling efficient online updates and quantization-friendly deployment. Our key contribution is the first adaptation of Kalman Bayesian Neural Networks to Transformer sequential adaptation, jointly optimizing robustness, low latency, and memory efficiency. Evaluated on Decision Transformer tasks, our approach significantly improves generalization under distribution shifts, achieves stable convergence with only a few new samples, and demonstrates high data efficiency and strong uncertainty calibration.

Technology Category

Application Category

📝 Abstract
Sequential fine-tuning of transformers is useful when new data arrive sequentially, especially with shifting distributions. Unlike batch learning, sequential learning demands that training be stabilized despite a small amount of data by balancing new information and previously learned knowledge in the pre-trained models. This challenge is further complicated when training is to be completed in latency-critical environments and learning must additionally quantify and be mediated by uncertainty. Motivated by these challenges, we propose a novel method that frames sequential fine-tuning as a posterior inference problem within a Bayesian framework. Our approach integrates closed-form moment propagation of random variables, Kalman Bayesian Neural Networks, and Taylor approximations of the moments of softmax functions. By explicitly accounting for pre-trained models as priors and adaptively balancing them against new information based on quantified uncertainty, our method achieves robust and data-efficient sequential learning. The effectiveness of our method is demonstrated through numerical simulations involving sequential adaptation of a decision transformer to tasks characterized by distribution shifts and limited memory resources.
Problem

Research questions and friction points this paper is trying to address.

Sequential fine-tuning with shifting data distributions
Balancing new information against prior knowledge
Uncertainty quantification in latency-critical learning environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian framework for sequential fine-tuning
Kalman Bayesian Neural Networks integration
Adaptive uncertainty-based balancing mechanism
🔎 Similar Papers
No similar papers found.
H
Haoming Jing
Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Oren Wright
Oren Wright
Carnegie Mellon University
machine learningsignal processingartificial intelligence
J
José M. F. Moura
Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Yorie Nakahira
Yorie Nakahira
Assistant Professor, Carnegie Mellon University
Control and learningOptimizationAutonomous systemsLanguage-guided control