BEFT: Bias-Efficient Fine-Tuning of Language Models

📅 2025-09-19

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This work addresses the lack of theoretical guidance for bias-term selection in parameter-efficient fine-tuning (PEFT) under low-data regimes. We propose an interpretable, causality-driven strategy for selecting bias terms in query/key/value projection layers—distinct from conventional gradient- or empirical Fisher-based heuristics. Our method explicitly models the causal relationship between bias-parameter updates and downstream task performance, enabling precise identification of task-critical bias terms. The resulting algorithm is model-agnostic and generalizes across diverse tasks. Evaluated on language models ranging from 110M to 6.7B parameters, it achieves state-of-the-art performance on classification, multiple-choice, and generation tasks. Under identical trainable parameter budgets, our approach consistently outperforms existing bias-only fine-tuning methods, significantly improving both parameter efficiency and generalization in low-resource settings.

Technology Category

Application Category

📝 Abstract

Fine-tuning all-bias-terms stands out among various parameter-efficient fine-tuning (PEFT) techniques, owing to its out-of-the-box usability and competitive performance, especially in low-data regimes. Bias-only fine-tuning has the potential for unprecedented parameter efficiency. However, the link between fine-tuning different bias terms (i.e., bias terms in the query, key, or value projections) and downstream performance remains unclear. The existing approaches, e.g., based on the magnitude of bias change or empirical Fisher information, provide limited guidance for selecting the particular bias term for effective fine-tuning. In this paper, we propose an approach for selecting the bias term to be fine-tuned, forming the foundation of our bias-efficient fine-tuning (BEFT). We extensively evaluate our bias-efficient approach against other bias-selection approaches, across a wide range of large language models (LLMs) spanning encoder-only and decoder-only architectures from 110M to 6.7B parameters. Our results demonstrate the effectiveness and superiority of our bias-efficient approach on diverse downstream tasks, including classification, multiple-choice, and generation tasks.

Problem

Research questions and friction points this paper is trying to address.

Selecting optimal bias terms for efficient language model fine-tuning

Understanding link between bias term types and downstream performance

Improving parameter efficiency in low-data fine-tuning regimes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Selects bias terms for fine-tuning

Evaluates across diverse language models

Improves performance on downstream tasks

🔎 Similar Papers

No similar papers found.