🤖 AI Summary
In large language model (LLM)-driven pairwise ranking (PRP), conventional evaluation paradigms—based on comparison count—fail when LLM inference cost dominates overall computational expense. Method: We propose a novel evaluation framework centered on LLM inference overhead, redefining algorithmic complexity for PRP and revealing, for the first time, that classical O(n log n) sorting algorithms can underperform O(n²) alternatives under high inference costs. Our approach introduces a synergistic optimization paradigm integrating batched processing and response caching, coupled with dynamic batch scheduling and fine-grained LLM inference modeling. Contribution/Results: Experiments demonstrate up to 47% reduction in LLM invocations and substantial throughput improvement in PRP systems. This work establishes both theoretical foundations and practical engineering pathways for LLM-native ranking systems.
📝 Abstract
We introduce a novel framework for analyzing sorting algorithms in pairwise ranking prompting (PRP), re-centering the cost model around LLM inferences rather than traditional pairwise comparisons. While classical metrics based on comparison counts have traditionally been used to gauge efficiency, our analysis reveals that expensive LLM inferences overturn these predictions; accordingly, our framework encourages strategies such as batching and caching to mitigate inference costs. We show that algorithms optimal in the classical setting can lose efficiency when LLM inferences dominate the cost under certain optimizations.