OG-Rank: Learning to Rank Fast and Slow with Uncertainty and Reward-Trend Guided Adaptive Exploration

📅 2025-10-20

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Addressing the challenge of balancing low latency and interpretability in clinical order re-ranking, this paper proposes OG-Rank—a lightweight, single-decoder re-ranking model. Methodologically, it introduces three key innovations: (1) a pooling-based ranking mechanism leveraging first-token scores for millisecond-scale inference; (2) an uncertainty gating module that dynamically detects ambiguous predictions and conditionally triggers explanation generation; and (3) a curriculum learning strategy tailored to hard cases, coupled with reward-trend-guided adaptive exploration, jointly optimizing ranking quality and latency controllability. Evaluated on real-world clinical tasks, OG-Rank achieves a Recall@1 of 0.45 in its fast path, improving to 0.56 upon gate activation, and attains an nDCG@20 of 0.699—significantly outperforming encoder-based baselines—while ensuring stable, predictable end-to-end latency.

Technology Category

Application Category

📝 Abstract

Clinicians need ranking systems that work in real time and still justify their choices. Motivated by the need for a low-latency, decoder-based reranker, we present OG-Rank, a single-decoder approach that pairs a pooled first-token scoring signal with an uncertainty-gated explanation step. The model scores all candidates in one pass and generates a brief, structured rationale only when the list is genuinely ambiguous, keeping latency predictable. Trained with a curriculum that concentrates effort on hard cases, OG-Rank delivers strong effectiveness on encounter-scoped order selection (fast path: Recall@1~0.45, nDCG@20~0.625) and improves further when the gate activates (Recall@1~0.56, nDCG@20~0.699 at a 45% gate rate), while compact backbones show similar gains under the same policy. Encoder baselines trail in both effectiveness and flexibility. The result is a practical recipe: rank fast by default and explain when it helps, a pattern that applies broadly to decision tasks where selective generation buys accuracy at acceptable cost. The single-policy design simplifies deployment and budget planning, and the curriculum principle (spend more on the hard cases, less on the easy ones) readily transfers beyond clinical order selection.

Problem

Research questions and friction points this paper is trying to address.

Develops low-latency ranking system for real-time clinical decisions

Generates explanations only when rankings are uncertain or ambiguous

Uses adaptive training to focus computational effort on difficult cases

Innovation

Methods, ideas, or system contributions that make the work stand out.

Single-decoder ranks candidates with pooled first-token scoring

Uncertainty-gated explanation step activates for ambiguous cases

Curriculum training concentrates effort on hard ranking cases

🔎 Similar Papers

Deep Reinforcement Learning for Dynamic Order Picking in Warehouse Operations