🤖 AI Summary
This study addresses the high prevalence of missing data in the DrugMatrix toxicogenomics database. We propose a three-dimensional structured imputation method based on non-negative tensor completion (NTC), which explicitly models the coupled relationships among tissue type, treatment condition, and gene expression—thereby preserving the intrinsic data distribution and accurately capturing organ-specific variability. Unlike conventional matrix factorization or CP decomposition approaches, our method integrates non-negative tensor decomposition with machine learning–driven optimization. Experimental results demonstrate that it significantly outperforms existing benchmark methods in terms of mean squared error and mean absolute error. We successfully applied the method to impute the world’s largest in vivo toxicogenomics database—comprising multi-organ, multi-dose, and multi-time-point measurements—with high accuracy. This advancement substantially enhances the reliability and feasibility of cross-species drug toxicity prediction and mechanistic interpretation.
📝 Abstract
We explore applying a tensor completion approach to complete the DrugMatrix toxicogenomics dataset. Our hypothesis is that by preserving the 3-dimensional structure of the data, which comprises tissue, treatment, and transcriptomic measurements, and by leveraging a machine learning formulation, our approach will improve upon prior state-of-the-art results. Our results demonstrate that the new tensor-based method more accurately reflects the original data distribution and effectively captures organ-specific variability. The proposed tensor-based methodology achieved lower mean squared errors and mean absolute errors compared to both conventional Canonical Polyadic decomposition and 2-dimensional matrix factorization methods. In addition, our non-negative tensor completion implementation reveals relationships among tissues. Our findings not only complete the world's largest in-vivo toxicogenomics database with improved accuracy but also offer a promising methodology for future studies of drugs that may cross species barriers, for example, from rats to humans.