LIME: Link-based user-item Interaction Modeling with decoupled xor attention for Efficient test time scaling

📅 2025-10-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Recommender systems face three key scalability bottlenecks: modeling long user behavior sequences, retrieving from massive candidate item sets, and supporting high model capacity. Conventional Transformers incur quadratic inference cost w.r.t. sequence length and linear cost w.r.t. candidate set size—rendering them impractical for industrial-scale deployment. This paper proposes LinkFormer, a novel architecture featuring *link embeddings*—enabling low-rank interaction modeling—and *decoupled attention* (LIME-XOR), a linear-complexity mechanism. Together, these innovations render inference complexity nearly independent of both sequence length and candidate set size. Evaluated on multiple public and industrial benchmarks, LinkFormer matches state-of-the-art Transformer accuracy while accelerating inference by 10×. Deployed on a real-world platform, it significantly improves user engagement. To our knowledge, this is the first approach to achieve efficient, expressive recommendation over ultra-long sequences and ultra-large candidate sets simultaneously.

Technology Category

Application Category

📝 Abstract
Scaling large recommendation systems requires advancing three major frontiers: processing longer user histories, expanding candidate sets, and increasing model capacity. While promising, transformers' computational cost scales quadratically with the user sequence length and linearly with the number of candidates. This trade-off makes it prohibitively expensive to expand candidate sets or increase sequence length at inference, despite the significant performance improvements. We introduce extbf{LIME}, a novel architecture that resolves this trade-off. Through two key innovations, LIME fundamentally reduces computational complexity. First, low-rank ``link embeddings" enable pre-computation of attention weights by decoupling user and candidate interactions, making the inference cost nearly independent of candidate set size. Second, a linear attention mechanism, extbf{LIME-XOR}, reduces the complexity with respect to user sequence length from quadratic ($O(N^2)$) to linear ($O(N)$). Experiments on public and industrial datasets show LIME achieves near-parity with state-of-the-art transformers but with a 10$ imes$ inference speedup on large candidate sets or long sequence lengths. When tested on a major recommendation platform, LIME improved user engagement while maintaining minimal inference costs with respect to candidate set size and user history length, establishing a new paradigm for efficient and expressive recommendation systems.
Problem

Research questions and friction points this paper is trying to address.

Reduces quadratic complexity of transformers to linear scaling
Enables efficient inference with large candidate sets
Maintains performance while handling long user histories
Innovation

Methods, ideas, or system contributions that make the work stand out.

Link embeddings decouple user-candidate interactions for efficiency
LIME-XOR linear attention reduces complexity from quadratic to linear
Pre-computed attention weights make inference cost candidate-independent
🔎 Similar Papers
No similar papers found.