🤖 AI Summary
Existing matrix completion methods struggle to effectively model the discrete, non-ordinal nature of categorical data. To address this limitation, this work proposes the LCMC framework, which encodes categorical entries as one-hot binary tensors and introduces a nested optimization scheme: an outer loop adaptively estimates the latent factor dimensionality, while an inner loop reconstructs the original matrix via tensor decomposition. Innovatively integrating a split-merge-refine strategy with adaptive data reduction, LCMC substantially enhances both scalability and robustness. Comprehensive experiments on real-world and synthetic datasets—including viral quasispecies reconstruction—demonstrate that LCMC consistently outperforms state-of-the-art methods in both completion accuracy and computational efficiency.
📝 Abstract
Matrix completion has been extensively studied for real-valued data, but existing methods are often limited in handling categorical variables. We propose LCMC, a double-loop optimization framework for categorical matrix completion via latent factorization based on a binary tensor representation. In this setting, each categorical entry is encoded as a one-hot vector along a third tensor mode, thereby preserving its discrete, non-ordinal nature. The outer loop adaptively estimates the latent dimension by iteratively updating it with feedback from the inner loop, while the inner loop reconstructs the categorical matrix through tensor factorization, supported by a corresponding theoretical analysis. To further improve scalability and robustness, we introduce enhancements including a split-merge-refine strategy and an adaptive data reduction technique. Experiments on synthetic and real-world datasets in viral quasispecies reconstruction, demonstrate that LCMC achieves superior accuracy and efficiency compared to existing methods.