🤖 AI Summary
Existing learned sorting algorithms lack theoretical complexity guarantees. This paper proposes PCF Learned Sort—the first learning-augmented sorting algorithm with rigorous theoretical guarantees. It models the underlying data distribution using piecewise constant functions (PCFs), integrates empirical distribution learning with adaptive bucketing, and proves an expected time complexity of $O(n log log n)$. This bound provides the first theoretically grounded explanation for why learned sorting can surpass the classical $Omega(n log n)$ lower bound for comparison-based sorting. Both theoretical analysis and empirical evaluation—across synthetic and real-world datasets—consistently validate the $O(n log log n)$ scaling behavior, demonstrating significant improvements over traditional sorting algorithms. The core contribution is a provably efficient, data-aware sorting framework that formally bridges distributional assumptions with algorithmic performance guarantees.
📝 Abstract
Sorting is one of the most fundamental algorithms in computer science. Recently, Learned Sorts, which use machine learning to improve sorting speed, have attracted attention. While existing studies show that Learned Sort is empirically faster than classical sorting algorithms, they do not provide theoretical guarantees about its computational complexity. We propose Piecewise Constant Function (PCF) Learned Sort, a theoretically guaranteed Learned Sort algorithm. We prove that the expected complexity of PCF Learned Sort is $mathcal{O}(n log log n)$ under mild assumptions on the data distribution. We also confirm empirically that PCF Learned Sort has a computational complexity of $mathcal{O}(n log log n)$ on both synthetic and real datasets. This is the first study to theoretically support the empirical success of Learned Sort, and provides evidence for why Learned Sort is fast. The code is available at https://github.com/atsukisato/PCF_Learned_Sort .