Instruction-Following Pruning for Large Language Models

📅 2025-01-03

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the trade-off between inference efficiency and task adaptability in large language models (LLMs), this paper proposes an instruction-driven dynamic structured pruning method. Unlike static pruning approaches, our method eliminates fixed parameter masks and instead enables real-time, instruction-dependent selection of parameter subsets. Specifically, a learnable sparse mask predictor conditions on the input instruction to dynamically activate the most relevant parameter modules, while jointly optimizing both instruction-tuning loss and pretraining objectives. Within a structured pruning framework, the pruned model activates only ~3B parameters yet achieves substantial gains—outperforming same-sized dense models by +5–8 points on mathematical reasoning and code generation benchmarks, and matching the performance of a 9B dense model. Our key contributions are: (i) the first instruction-conditioned dynamic pruning paradigm for LLMs, and (ii) a unified solution that simultaneously delivers efficient inference and strong task-specific adaptation.

Technology Category

Application Category

📝 Abstract

With the rapid scaling of large language models (LLMs), structured pruning has become a widely used technique to learn efficient, smaller models from larger ones, delivering superior performance compared to training similarly sized models from scratch. In this paper, we move beyond the traditional static pruning approach of determining a fixed pruning mask for a model, and propose a dynamic approach to structured pruning. In our method, the pruning mask is input-dependent and adapts dynamically based on the information described in a user instruction. Our approach, termed"instruction-following pruning", introduces a sparse mask predictor that takes the user instruction as input and dynamically selects the most relevant model parameters for the given task. To identify and activate effective parameters, we jointly optimize the sparse mask predictor and the LLM, leveraging both instruction-following data and the pre-training corpus. Experimental results demonstrate the effectiveness of our approach on a wide range of evaluation benchmarks. For example, our 3B activated model improves over the 3B dense model by 5-8 points of absolute margin on domains such as math and coding, and rivals the performance of a 9B model.

Problem

Research questions and friction points this paper is trying to address.

Language Model Distillation

Efficiency

Performance Optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic Instruction Pruning

Intelligent Predictor

Efficient Model Optimization

🔎 Similar Papers

No similar papers found.

Authors to Follow