Generalization Guarantees for Multi-View Representation Learning and Application to Regularization via Gaussian Product Mixture Prior

📅 2025-04-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses representation learning in distributed multi-view settings, where agents must extract features from local views alone—without explicit coordination—such that the union of these representations is both sufficient and necessary for decoding joint labels. We derive a novel data-dependent, symmetric prior-based generalization bound (in the Minimum Description Length sense), theoretically showing that Gaussian product mixture priors naturally encourage redundant feature extraction. Leveraging this insight, we propose a weighted attention mechanism. Our method integrates relative entropy-based generalization analysis, variational inference, and multi-view collaborative regularization. Experiments demonstrate: (i) superior performance over VIB and CDVIB on single-view tasks; (ii) significant generalization gains from controlled redundancy in multi-view settings; and (iii) strong alignment between theoretical guarantees and empirical results.

Technology Category

Application Category

📝 Abstract
We study the problem of distributed multi-view representation learning. In this problem, $K$ agents observe each one distinct, possibly statistically correlated, view and independently extracts from it a suitable representation in a manner that a decoder that gets all $K$ representations estimates correctly the hidden label. In the absence of any explicit coordination between the agents, a central question is: what should each agent extract from its view that is necessary and sufficient for a correct estimation at the decoder? In this paper, we investigate this question from a generalization error perspective. First, we establish several generalization bounds in terms of the relative entropy between the distribution of the representations extracted from training and"test"datasets and a data-dependent symmetric prior, i.e., the Minimum Description Length (MDL) of the latent variables for all views and training and test datasets. Then, we use the obtained bounds to devise a regularizer; and investigate in depth the question of the selection of a suitable prior. In particular, we show and conduct experiments that illustrate that our data-dependent Gaussian mixture priors with judiciously chosen weights lead to good performance. For single-view settings (i.e., $K=1$), our experimental results are shown to outperform existing prior art Variational Information Bottleneck (VIB) and Category-Dependent VIB (CDVIB) approaches. Interestingly, we show that a weighted attention mechanism emerges naturally in this setting. Finally, for the multi-view setting, we show that the selection of the joint prior as a Gaussians product mixture induces a Gaussian mixture marginal prior for each marginal view and implicitly encourages the agents to extract and output redundant features, a finding which is somewhat counter-intuitive.
Problem

Research questions and friction points this paper is trying to address.

Distributed multi-view representation learning without agent coordination
Generalization bounds via data-dependent symmetric priors
Optimal prior selection for Gaussian mixture regularization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-view learning with Gaussian product mixture prior
Generalization bounds via Minimum Description Length
Weighted attention mechanism for feature extraction
🔎 Similar Papers
Milad Sefidgaran
Milad Sefidgaran
Senior ML Researcher
Machine LearningDeep LearningInformation Theory
A
Abdellatif Zaidi
Université Gustave Eiffel, France Huawei Paris Research Center, France
P
Piotr Krasnowski
Huawei Paris Research Center, France