🤖 AI Summary
Pareto optimization of large language models (LLMs) for capability and inference efficiency remains challenging: model-level approaches yield sparse Pareto sets, while layer-level methods suffer from the curse of dimensionality due to high-dimensional search spaces.
Method: We propose a block-level Pareto set construction framework. First, we introduce a novel hybrid optimal block partitioning strategy that formulates inter-layer optimization as a one-dimensional dynamic programming clustering problem. Second, we design a Bayesian multi-objective evolutionary loop based on the quasi-Expected Hypervolume Improvement (qEHVI) acquisition function to enable fully automated, high-fidelity Pareto front generation.
Contribution/Results: By integrating block-wise parameter merging with multi-objective evolutionary optimization, our method significantly improves Pareto front coverage and quality across multiple LLMs. It achieves an average 32.7% hypervolume gain over state-of-the-art methods and supports agile model selection under joint constraints—including latency, GPU memory, and accuracy.
📝 Abstract
Constructing a Pareto set is pivotal for navigating the capability-efficiency trade-offs in Large Language Models (LLMs); however, existing merging techniques remain inadequate for this task. Coarse-grained, model-level methods yield only a sparse set of suboptimal solutions, while fine-grained, layer-wise approaches suffer from the "curse of dimensionality," rendering the search space computationally intractable. To resolve this dichotomy, we propose BAMBO (Bayesian Adaptive Multi-objective Block-wise Optimization), a novel framework that automatically constructs the LLM Pareto set. BAMBO renders the search tractable by introducing a Hybrid Optimal Block Partitioning strategy. Formulated as a 1D clustering problem, this strategy leverages a dynamic programming approach to optimally balance intra-block homogeneity and inter-block information distribution, thereby dramatically reducing dimensionality without sacrificing critical granularity. The entire process is automated within an evolutionary loop driven by the q-Expected Hypervolume Improvement (qEHVI) acquisition function. Experiments demonstrate that BAMBO discovers a superior and more comprehensive Pareto frontier than baselines, enabling agile model selection tailored to diverse operational constraints. Code is available at: https://github.com/xin8coder/BAMBO.