🤖 AI Summary
Sample inefficiency constitutes a fundamental bottleneck in large-scale parallel ranking and selection (R&S). Method: This paper proposes a novel “Cluster-and-Conquer” paradigm that inserts a lightweight, correlation-driven clustering step prior to classical divide-and-conquer—avoiding both high-precision correlation estimation and stringent clustering assumptions. Contribution/Results: We establish theoretical optimality in sample complexity; design the first robust parallel clustering algorithm tailored for R&S; and integrate gradient analysis with correlation modeling to enable seamless embedding into existing R&S pipelines. Experiments on real-world AI tasks—including neural architecture search—demonstrate substantial reductions in sampling overhead, achieving simultaneous theoretical optimality and practical performance gains.
📝 Abstract
This work seeks to break the sample efficiency bottleneck in parallel large-scale ranking and selection (R&S) problems by leveraging correlation information. We modify the commonly used"divide and conquer"framework in parallel computing by adding a correlation-based clustering step, transforming it into"clustering and conquer". This seemingly simple modification achieves the optimal sample complexity reduction for a widely used class of efficient large-scale R&S procedures. Our approach enjoys two key advantages: 1) it does not require highly accurate correlation estimation or precise clustering, and 2) it allows for seamless integration with various existing R&S procedures, while achieving optimal sample complexity. Theoretically, we develop a novel gradient analysis framework to analyze sample efficiency and guide the design of large-scale R&S procedures. We also introduce a new parallel clustering algorithm tailored for large-scale scenarios. Finally, in large-scale AI applications such as neural architecture search, our methods demonstrate superior performance.