🤖 AI Summary
Existing bipartite learning methods suffer from poor generalization, limited scalability, and inability to uniformly model dual-instance interactions (e.g., drug–target, RNA–disease). To address these limitations, we propose Oxytrees—a novel biclustering model tree framework built upon surrogate matrix compression. Its core innovations include: (i) constructing low-rank surrogate matrices along row/column dimensions for efficient dimensionality reduction; (ii) a new leaf-node assignment strategy; and (iii) integrating Kronecker-product kernel linear models at leaf nodes, drastically reducing tree depth and computational overhead. Oxytrees synergistically unifies model trees, biclustering, surrogate compression, Kronecker kernels, and ensemble mechanisms—balancing expressive power and efficiency. Evaluated on 15 benchmark datasets, it matches or surpasses state-of-the-art methods in predictive performance, achieves up to 30× faster training, and demonstrates superior inductive generalization. A fully reproducible Python API is publicly released.
📝 Abstract
Bipartite learning is a machine learning task that aims to predict interactions between pairs of instances. It has been applied to various domains, including drug-target interactions, RNA-disease associations, and regulatory network inference. Despite being widely investigated, current methods still present drawbacks, as they are often designed for a specific application and thus do not generalize to other problems or present scalability issues. To address these challenges, we propose Oxytrees: proxy-based biclustering model trees. Oxytrees compress the interaction matrix into row- and column-wise proxy matrices, significantly reducing training time without compromising predictive performance. We also propose a new leaf-assignment algorithm that significantly reduces the time taken for prediction. Finally, Oxytrees employ linear models using the Kronecker product kernel in their leaves, resulting in shallower trees and thus even faster training. Using 15 datasets, we compared the predictive performance of ensembles of Oxytrees with that of the current state-of-the-art. We achieved up to 30-fold improvement in training times compared to state-of-the-art biclustering forests, while demonstrating competitive or superior performance in most evaluation settings, particularly in the inductive setting. Finally, we provide an intuitive Python API to access all datasets, methods and evaluation measures used in this work, thus enabling reproducible research in this field.