A Scaling Law for Token Efficiency in LLM Fine-Tuning Under Fixed Compute Budgets

📅 2025-05-09

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

This work investigates token efficiency optimization in large language model (LLM) fine-tuning under fixed computational budgets, revealing that data composition—specifically the interplay between sample count and average sequence length—exerts a stronger influence on performance than total token count alone. To address this, we propose the first fine-tuning scaling law that explicitly models data composition, departing from conventional token-count-only assumptions. Empirical analysis on BRICC and MMLU subsets, combined with diverse subsampling strategies and standard scaling law fitting, robustly demonstrates the significant impact of data composition on token efficiency. Our findings yield quantifiable, interpretable theoretical principles and practical guidelines for resource-constrained LLM fine-tuning.

Technology Category

Application Category

📝 Abstract

We introduce a scaling law for fine-tuning large language models (LLMs) under fixed compute budgets that explicitly accounts for data composition. Conventional approaches measure training data solely by total tokens, yet the number of examples and their average token length -- what we term emph{dataset volume} -- play a decisive role in model performance. Our formulation is tuned following established procedures. Experiments on the BRICC dataset cite{salavati2024reducing} and subsets of the MMLU dataset cite{hendrycks2021measuringmassivemultitasklanguage}, evaluated under multiple subsampling strategies, reveal that data composition significantly affects token efficiency. These results motivate refined scaling laws for practical LLM fine-tuning in resource-constrained settings.

Problem

Research questions and friction points this paper is trying to address.

Study token efficiency in LLM fine-tuning under fixed compute

Analyze impact of data composition on model performance

Develop refined scaling laws for resource-constrained LLM fine-tuning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Scaling law for token efficiency in LLMs

Accounts for dataset volume and composition

Optimizes fine-tuning under fixed compute budgets

🔎 Similar Papers

Step-by-Step Unmasking for Parameter-Efficient Fine-tuning of Large Language Models