BiSSLB: Binary Spike-and-Slab Lasso Biclustering

📅 2026-03-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a novel Bayesian logistic matrix factorization approach to address the limitations of existing binary biclustering methods, which often suffer from poor performance under noise, limited scalability, and bias due to prespecified bicluster structures. The proposed method models overlapping biclusters using a spike-and-slab Lasso prior and incorporates the Indian Buffet Process (IBP) to automatically infer the number of clusters without requiring prior assumptions about noise levels or cluster characteristics. An efficient coordinate ascent algorithm with proximal steps ensures both computational efficiency and scalability. Experimental results on synthetic data as well as real-world datasets—including HapMap SNP data and human protein–protein interaction networks—demonstrate that the method significantly outperforms current approaches, exhibiting superior robustness and pattern discovery capability, particularly in high-noise settings.

Technology Category

Application Category

📝 Abstract
Biclustering is a powerful unsupervised learning technique for simultaneously identifying coherent subsets of rows and columns in a data matrix, thus revealing local patterns that may not be apparent in global analyses. However, most biclustering methods are developed for continuous data and are not applicable for binary datasets such as single-nucleotide polymorphism (SNP) or protein-protein interaction (PPI) data. Existing biclustering algorithms for binary data often struggle to recover biclustering patterns under noise, face scalability issues, and/or bias the final results towards biclusters of a particular size or characteristic. We propose a Bayesian method for biclustering binary datasets called Binary Spike-and-Slab Lasso Biclustering (BiSSLB). Our method is robust to noise and allows for overlapping biclusters of various sizes without prior knowledge of the noise level or bicluster characteristics. BiSSLB is based on a logistic matrix factorization model with spike-and-slab priors on the latent spaces. We further incorporate an Indian Buffet Process (IBP) prior to automatically determine the number of biclusters from the data. We develop a novel coordinate ascent algorithm with proximal steps which allows for scalable computation. The performance of our proposed approach is assessed through simulations and two real applications on HapMap SNP and Homo Sapiens PPI data, where BiSSLB is shown to outperform other state-of-the-art binary biclustering methods when the data is very noisy.
Problem

Research questions and friction points this paper is trying to address.

biclustering
binary data
noise robustness
scalability
cluster bias
Innovation

Methods, ideas, or system contributions that make the work stand out.

Binary biclustering
Spike-and-slab prior
Indian Buffet Process
Logistic matrix factorization
Coordinate ascent with proximal steps
🔎 Similar Papers
No similar papers found.