Rethinking Incompleteness: Formalizing Protocol Divergence and Train-Once Learning for Robust IMVC

📅 2026-06-03

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work addresses a critical limitation in existing incomplete multi-view clustering (IMVC) evaluation paradigms, which rely on retraining models for each missing pattern and assess data incompleteness solely by missing rate—often misjudging model robustness. The authors introduce the concept of “incompleteness divergence,” revealing substantial variation in the proportion of fully observed samples even under identical missing rates, and demonstrate that conventional reconstruction objectives fail when this proportion falls below a certain threshold. To overcome this, they propose CRAFT, a sample-independent, mask-aware variable-length fusion attention Transformer that generalizes across diverse missing patterns through a single training run. Evaluated on seven benchmarks, CRAFT matches or surpasses per-configuration trained baselines while reducing training overhead by 8.8×, validating a new paradigm that embeds robustness into architecture rather than loss functions.

📝 Abstract

Standard IMVC evaluation retrains separate models for different missing-data configurations. We show that this paradigm obscures a fundamental vulnerability: missing rate alone is insufficient to characterize data incompleteness. Specifically, we show that protocols with identical nominal missing rates can differ by up to $50\times$ in their proportion of fully observed samples, inducing drastically different learning regimes. We formalize this phenomenon as incompleteness divergence, providing measures that capture structural disparities across missing-data protocols. We further prove that for a broad class of reconstruction-based objectives, learning becomes structurally ill-posed when the proportion of complete samples falls below a critical threshold, leading to near-random performance. To bypass this theoretical bound, we propose CRAFT (Complete-data Robust Attention-masked Fusion Transformer). CRAFT shifts the burden of robustness from the loss function to the architecture via two key properties: (i) per-sample independence, which removes reliance on complete-sample co-occurrence, and (ii) mask-aware variable-length fusion, which aggregates only observed views through attention masking. This design allows a single model, trained once on complete data, to generalize to diverse missing patterns at inference time without retraining. Extensive experiments on seven benchmarks show that CRAFT matches or outperforms per-configuration baselines while reducing training overhead by $8.8\times$, demonstrating that robustness to missing data can be achieved as an inherent architectural property. Code (CRAFT) and our imvc-audit toolkit are available at https://anonymous.4open.science/r/CRAFT-BF80/ and https://anonymous.4open.science/r/imvc-audit-8263/.

Problem

Research questions and friction points this paper is trying to address.

incompleteness divergence

missing-data protocols

incomplete multi-view clustering

structural ill-posedness

complete-sample proportion

Innovation

Methods, ideas, or system contributions that make the work stand out.

incompleteness divergence

train-once learning

mask-aware fusion