Are Optimal Algorithms Still Optimal? Rethinking Sorting in LLM-Based Pairwise Ranking with Batching and Caching

📅 2025-05-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In large language model (LLM)-driven pairwise ranking (PRP), conventional evaluation paradigms—based on comparison count—fail when LLM inference cost dominates overall computational expense. Method: We propose a novel evaluation framework centered on LLM inference overhead, redefining algorithmic complexity for PRP and revealing, for the first time, that classical O(n log n) sorting algorithms can underperform O(n²) alternatives under high inference costs. Our approach introduces a synergistic optimization paradigm integrating batched processing and response caching, coupled with dynamic batch scheduling and fine-grained LLM inference modeling. Contribution/Results: Experiments demonstrate up to 47% reduction in LLM invocations and substantial throughput improvement in PRP systems. This work establishes both theoretical foundations and practical engineering pathways for LLM-native ranking systems.

Technology Category

Application Category

📝 Abstract
We introduce a novel framework for analyzing sorting algorithms in pairwise ranking prompting (PRP), re-centering the cost model around LLM inferences rather than traditional pairwise comparisons. While classical metrics based on comparison counts have traditionally been used to gauge efficiency, our analysis reveals that expensive LLM inferences overturn these predictions; accordingly, our framework encourages strategies such as batching and caching to mitigate inference costs. We show that algorithms optimal in the classical setting can lose efficiency when LLM inferences dominate the cost under certain optimizations.
Problem

Research questions and friction points this paper is trying to address.

Analyzing sorting algorithms in LLM-based pairwise ranking
Shifting cost model from comparisons to LLM inferences
Re-evaluating algorithm efficiency with batching and caching
Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel framework for LLM-based pairwise ranking
Focus on batching and caching to reduce costs
Re-evaluate classical algorithms under LLM inference costs
🔎 Similar Papers
No similar papers found.
Juan Wisznia
Juan Wisznia
Unknown affiliation
C
Cecilia Bolanos
Departamento de Computación, FCEyN, Universidad de Buenos Aires; Instituto de Ciencias de la Computación, FCEyN, Universidad de Buenos Aires
J
Juan Tollo
Departamento de Computación, FCEyN, Universidad de Buenos Aires
Giovanni Marraffini
Giovanni Marraffini
Paris Brain Institute
Computational NeuroscienceArtificial Intelligence
A
Agust'in Gianolini
Departamento de Computación, FCEyN, Universidad de Buenos Aires; Lumina Labs*
N
Noe Hsueh
Departamento de Computación, FCEyN, Universidad de Buenos Aires
Luciano Del Corro
Luciano Del Corro
Microsoft Research
natural language understandinginformation extractionrelation extractionknowledge bases