Oxytrees: Model Trees for Bipartite Learning

📅 2025-11-16

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Existing bipartite learning methods suffer from poor generalization, limited scalability, and inability to uniformly model dual-instance interactions (e.g., drug–target, RNA–disease). To address these limitations, we propose Oxytrees—a novel biclustering model tree framework built upon surrogate matrix compression. Its core innovations include: (i) constructing low-rank surrogate matrices along row/column dimensions for efficient dimensionality reduction; (ii) a new leaf-node assignment strategy; and (iii) integrating Kronecker-product kernel linear models at leaf nodes, drastically reducing tree depth and computational overhead. Oxytrees synergistically unifies model trees, biclustering, surrogate compression, Kronecker kernels, and ensemble mechanisms—balancing expressive power and efficiency. Evaluated on 15 benchmark datasets, it matches or surpasses state-of-the-art methods in predictive performance, achieves up to 30× faster training, and demonstrates superior inductive generalization. A fully reproducible Python API is publicly released.

Technology Category

Application Category

📝 Abstract

Bipartite learning is a machine learning task that aims to predict interactions between pairs of instances. It has been applied to various domains, including drug-target interactions, RNA-disease associations, and regulatory network inference. Despite being widely investigated, current methods still present drawbacks, as they are often designed for a specific application and thus do not generalize to other problems or present scalability issues. To address these challenges, we propose Oxytrees: proxy-based biclustering model trees. Oxytrees compress the interaction matrix into row- and column-wise proxy matrices, significantly reducing training time without compromising predictive performance. We also propose a new leaf-assignment algorithm that significantly reduces the time taken for prediction. Finally, Oxytrees employ linear models using the Kronecker product kernel in their leaves, resulting in shallower trees and thus even faster training. Using 15 datasets, we compared the predictive performance of ensembles of Oxytrees with that of the current state-of-the-art. We achieved up to 30-fold improvement in training times compared to state-of-the-art biclustering forests, while demonstrating competitive or superior performance in most evaluation settings, particularly in the inductive setting. Finally, we provide an intuitive Python API to access all datasets, methods and evaluation measures used in this work, thus enabling reproducible research in this field.

Problem

Research questions and friction points this paper is trying to address.

Addressing lack of generalization in bipartite learning methods

Solving scalability issues in bipartite interaction prediction

Improving training and prediction speed for biclustering models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proxy-based biclustering model trees for bipartite learning

Kronecker product kernel linear models in leaves

Novel leaf-assignment algorithm for faster prediction

🔎 Similar Papers

No similar papers found.