🤖 AI Summary
Enhancing classification performance under the foundation model paradigm. Method: We propose scalable Bayesian Model Averaging (BMA) and Optimization-based Model Averaging (OMA) frameworks: freezing pretrained vision/language model features and learning only the posterior distribution over linear classifier weights; adapting full BMA to large-scale foundation models for the first time; and introducing a sampling-free OMA that achieves efficient, principle-driven ensemble weighting by minimizing predictive entropy. Contributions/Results: Our approach unifies image and text classification, substantially outperforming single models and conventional ensembles across multiple benchmarks; enables plug-and-play integration with new foundation models; and incurs negligible inference overhead.
📝 Abstract
We revisit the classical, full-fledged Bayesian model averaging (BMA) paradigm to ensemble pre-trained and/or lightly-finetuned foundation models to enhance the classification performance on image and text data. To make BMA tractable under foundation models, we introduce trainable linear classifiers that take frozen features from the pre-trained foundation models as inputs. The model posteriors over the linear classifiers tell us which linear heads and frozen features are better suited for a given dataset, resulting in a principled model ensembling method. Furthermore, we propose a computationally cheaper, optimizable model averaging scheme (OMA). In OMA, we directly optimize the model ensemble weights, just like those weights based on model posterior distributions in BMA, by reducing the amount of surprise (expected entropy of the predictions) we get from predictions of ensembled models. With the rapid development of foundation models, these approaches will enable the incorporation of future, possibly significantly better foundation models to enhance the performance of challenging classification tasks.