PAIR-Former: Budgeted Relational MIL for miRNA Target Prediction

📅 2026-01-31

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This work addresses the challenge of miRNA–mRNA target prediction, where an enormous number of candidate binding sites coexists with scarce pairwise labels and limited computational resources. The authors propose the first Budgeted Relational Multi-Instance Learning (BR-MIL) framework, which operates under a strict computational budget by efficiently encoding and modeling relationships among at most K candidate sites. The approach employs a two-stage pipeline: a low-cost scan over the entire candidate pool followed by a diversity-based selection of K sites on CPU, whose interdependencies are captured via a permutation-invariant Set Transformer aggregator. Evaluated on the miRAW dataset under a practical budget of K=64, the method significantly outperforms strong baselines while achieving high accuracy, controllable computational overhead, and theoretical guarantees, thereby demonstrating its effectiveness and scalability.

Technology Category

Application Category

📝 Abstract

Functional miRNA--mRNA targeting is a large-bag prediction problem: each transcript yields a heavy-tailed pool of candidate target sites (CTSs), yet only a pair-level label is observed. We formalize this regime as \emph{Budgeted Relational Multi-Instance Learning (BR-MIL)}, where at most $K$ instances per bag may receive expensive encoding and relational processing under a hard compute budget. We propose \textbf{PAIR-Former} (Pool-Aware Instance-Relational Transformer), a BR-MIL pipeline that performs a cheap full-pool scan, selects up to $K$ diverse CTSs on CPU, and applies a permutation-invariant Set Transformer aggregator on the selected tokens. On miRAW, PAIR-Former outperforms strong pooling baselines at a practical operating budget ($K^\star{=}64$) while providing a controllable accuracy--compute trade-off as $K$ varies. We further provide theory linking budgeted selection to (i) approximation error decreasing with $K$ and (ii) generalization terms governed by $K$ in the expensive relational component.

Problem

Research questions and friction points this paper is trying to address.

miRNA target prediction

Budgeted Relational Multi-Instance Learning

candidate target sites

compute budget

large-bag prediction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Budgeted Relational MIL

PAIR-Former

miRNA target prediction