RADAR: Recall Augmentation through Deferred Asynchronous Retrieval

📅 2025-06-08

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

In the initial retrieval stage of large-scale recommendation systems, conventional efficient methods (e.g., KNN) suffer from low recall accuracy—struggling to distinguish highly relevant items from weakly attractive candidates—thereby limiting Recall@200. Method: We propose a deferred asynchronous retrieval mechanism that injects full-rank sorting model capabilities into the retrieval stage. Specifically, we leverage asynchronous offline pre-ranking to construct a high-quality candidate pool, augment KNN with caching-aware indexing and multi-stage funneling, thereby bypassing online pre-ranking bottlenecks. Contribution/Results: Our approach achieves a 2× improvement in offline Recall@200 without increasing online latency. Online A/B tests demonstrate a 0.8% lift in core engagement metrics. To our knowledge, this is the first work to decouple high-complexity ranking capabilities into an asynchronous offline retrieval path, significantly enhancing first-stage retrieval quality.

Technology Category

Application Category

📝 Abstract

Modern large-scale recommender systems employ multi-stage ranking funnel (Retrieval, Pre-ranking, Ranking) to balance engagement and computational constraints (latency, CPU). However, the initial retrieval stage, often relying on efficient but less precise methods like K-Nearest Neighbors (KNN), struggles to effectively surface the most engaging items from billion-scale catalogs, particularly distinguishing highly relevant and engaging candidates from merely relevant ones. We introduce Recall Augmentation through Deferred Asynchronous Retrieval (RADAR), a novel framework that leverages asynchronous, offline computation to pre-rank a significantly larger candidate set for users using the full complexity ranking model. These top-ranked items are stored and utilized as a high-quality retrieval source during online inference, bypassing online retrieval and pre-ranking stages for these candidates. We demonstrate through offline experiments that RADAR significantly boosts recall (2X Recall@200 vs DNN retrieval baseline) by effectively combining a larger retrieved candidate set with a more powerful ranking model. Online A/B tests confirm a +0.8% lift in topline engagement metrics, validating RADAR as a practical and effective method to improve recommendation quality under strict online serving constraints.

Problem

Research questions and friction points this paper is trying to address.

Improving recall in large-scale recommender systems

Enhancing retrieval of engaging items from billion-scale catalogs

Balancing computational constraints with recommendation quality

Innovation

Methods, ideas, or system contributions that make the work stand out.

Asynchronous offline computation for pre-ranking

Larger candidate set with full ranking model

Bypasses online retrieval and pre-ranking stages

🔎 Similar Papers

OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question Answering