DarwinLM: Evolutionary Structured Pruning of Large Language Models

📅 2025-02-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Structured pruning of large language models (LLMs) faces challenges in balancing computational efficiency, subnetwork capability, and trainability during deployment. Method: This paper proposes a training-aware evolutionary structured pruning framework that dynamically discovers non-uniform (channel- and layer-level) pruning policies via evolutionary search; each generation incorporates lightweight multi-stage fine-tuning to jointly optimize pruned architecture and trainability, and introduces a novel progressive fine-tuning evaluation mechanism for cross-architecture adaptive compression. Results: The method achieves state-of-the-art performance in structured pruning on Llama-2-7B, Llama-3.1-8B, and Qwen-2.5-14B-Instruct. It reduces post-training data requirements by 5× compared to ShearedLlama while delivering superior accuracy.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have achieved significant success across various NLP tasks. However, their massive computational costs limit their widespread use, particularly in real-time applications. Structured pruning offers an effective solution by compressing models and directly providing end-to-end speed improvements, regardless of the hardware environment. Meanwhile, different components of the model exhibit varying sensitivities towards pruning, calling for emph{non-uniform} model compression. However, a pruning method should not only identify a capable substructure, but also account for post-compression training. To this end, we propose sysname, a method for emph{training-aware} structured pruning. sysname builds upon an evolutionary search process, generating multiple offspring models in each generation through mutation, and selecting the fittest for survival. To assess the effect of post-training, we incorporate a lightweight, multistep training process within the offspring population, progressively increasing the number of tokens and eliminating poorly performing models in each selection stage. We validate our method through extensive experiments on Llama-2-7B, Llama-3.1-8B and Qwen-2.5-14B-Instruct, achieving state-of-the-art performance for structured pruning. For instance, sysname surpasses ShearedLlama while requiring $5 imes$ less training data during post-compression training.
Problem

Research questions and friction points this paper is trying to address.

Reduces computational costs of Large Language Models
Implements non-uniform structured pruning effectively
Enhances post-compression training efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evolutionary search for pruning
Training-aware structured pruning
Multistep lightweight training process
🔎 Similar Papers
No similar papers found.
S
Shengkun Tang
Department of Machine Learning, MBZUAI, Abu Dhabi, UAE
O
Oliver Sieberling
ETH Zurich, Zurich, Switzerland
Eldar Kurtic
Eldar Kurtic
Red Hat AI and IST Austria
Machine Learning
Z
Zhiqiang Shen
Department of Machine Learning, MBZUAI, Abu Dhabi, UAE
Dan Alistarh
Dan Alistarh
Professor at IST Austria
Machine LearningAlgorithmsDistributed Computing