Revisiting Bayesian Model Averaging in the Era of Foundation Models

📅 2025-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Enhancing classification performance under the foundation model paradigm. Method: We propose scalable Bayesian Model Averaging (BMA) and Optimization-based Model Averaging (OMA) frameworks: freezing pretrained vision/language model features and learning only the posterior distribution over linear classifier weights; adapting full BMA to large-scale foundation models for the first time; and introducing a sampling-free OMA that achieves efficient, principle-driven ensemble weighting by minimizing predictive entropy. Contributions/Results: Our approach unifies image and text classification, substantially outperforming single models and conventional ensembles across multiple benchmarks; enables plug-and-play integration with new foundation models; and incurs negligible inference overhead.

Technology Category

Application Category

📝 Abstract
We revisit the classical, full-fledged Bayesian model averaging (BMA) paradigm to ensemble pre-trained and/or lightly-finetuned foundation models to enhance the classification performance on image and text data. To make BMA tractable under foundation models, we introduce trainable linear classifiers that take frozen features from the pre-trained foundation models as inputs. The model posteriors over the linear classifiers tell us which linear heads and frozen features are better suited for a given dataset, resulting in a principled model ensembling method. Furthermore, we propose a computationally cheaper, optimizable model averaging scheme (OMA). In OMA, we directly optimize the model ensemble weights, just like those weights based on model posterior distributions in BMA, by reducing the amount of surprise (expected entropy of the predictions) we get from predictions of ensembled models. With the rapid development of foundation models, these approaches will enable the incorporation of future, possibly significantly better foundation models to enhance the performance of challenging classification tasks.
Problem

Research questions and friction points this paper is trying to address.

Enhancing classification performance using Bayesian model averaging with foundation models
Making BMA tractable via trainable linear classifiers on frozen features
Proposing optimizable model averaging (OMA) for efficient ensemble weight optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Ensemble pre-trained foundation models with BMA
Trainable linear classifiers on frozen features
Optimizable model averaging scheme (OMA)
🔎 Similar Papers
No similar papers found.