🤖 AI Summary
This work addresses the challenge in drug discovery posed by high-dimensional, noisy auxiliary features—such as chemical and genomic data—that exhibit heterogeneous relevance, thereby limiting both predictive performance and model interpretability. To tackle this issue, the study introduces a novel approach that, for the first time, integrates Bayesian variable selection into an inductive matrix completion framework. By learning sparse latent embeddings, the method automatically identifies and prioritizes the most informative biomedical features. Evaluated on tasks including Mycobacterium tuberculosis drug resistance prediction and drug repositioning, the proposed model consistently outperforms current state-of-the-art methods. Moreover, it successfully pinpoints clinically meaningful variables, achieving a favorable balance between predictive accuracy and interpretability.
📝 Abstract
Recent advances in drug discovery have demonstrated that incorporating side information (e.g., chemical properties about drugs and genomic information about diseases) often greatly improves prediction performance. However, these side features can vary widely in relevance and are often noisy and high-dimensional. We propose Bayesian Variable Selection-Guided Inductive Matrix Completion (BVSIMC), a new Bayesian model that enables variable selection from side features in drug discovery. By learning sparse latent embeddings, BVSIMC improves both predictive accuracy and interpretability. We validate our method through simulation studies and two drug discovery applications: 1) prediction of drug resistance in Mycobacterium tuberculosis, and 2) prediction of new drug-disease associations in computational drug repositioning. On both synthetic and real data, BVSIMC outperforms several other state-of-the-art methods in terms of prediction. In our two real examples, BVSIMC further reveals the most clinically meaningful side features.