A Scaling Law for Token Efficiency in LLM Fine-Tuning Under Fixed Compute Budgets

📅 2025-05-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates token efficiency optimization in large language model (LLM) fine-tuning under fixed computational budgets, revealing that data composition—specifically the interplay between sample count and average sequence length—exerts a stronger influence on performance than total token count alone. To address this, we propose the first fine-tuning scaling law that explicitly models data composition, departing from conventional token-count-only assumptions. Empirical analysis on BRICC and MMLU subsets, combined with diverse subsampling strategies and standard scaling law fitting, robustly demonstrates the significant impact of data composition on token efficiency. Our findings yield quantifiable, interpretable theoretical principles and practical guidelines for resource-constrained LLM fine-tuning.

Technology Category

Application Category

📝 Abstract
We introduce a scaling law for fine-tuning large language models (LLMs) under fixed compute budgets that explicitly accounts for data composition. Conventional approaches measure training data solely by total tokens, yet the number of examples and their average token length -- what we term emph{dataset volume} -- play a decisive role in model performance. Our formulation is tuned following established procedures. Experiments on the BRICC dataset cite{salavati2024reducing} and subsets of the MMLU dataset cite{hendrycks2021measuringmassivemultitasklanguage}, evaluated under multiple subsampling strategies, reveal that data composition significantly affects token efficiency. These results motivate refined scaling laws for practical LLM fine-tuning in resource-constrained settings.
Problem

Research questions and friction points this paper is trying to address.

Study token efficiency in LLM fine-tuning under fixed compute
Analyze impact of data composition on model performance
Develop refined scaling laws for resource-constrained LLM fine-tuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Scaling law for token efficiency in LLMs
Accounts for dataset volume and composition
Optimizes fine-tuning under fixed compute budgets
🔎 Similar Papers
No similar papers found.
R
Ryan Lagasse
University of Connecticut
A
Aidan Kiernans
University of Connecticut
A
Avijit Ghosh
University of Connecticut
Shiri Dori-Hacohen
Shiri Dori-Hacohen
University of Connecticut
Controversy DetectionInformation RetrievalUbiquitous ComputingHuman-Computer Interaction