Fine-Tuning Improves Information Conveyance in Language Models

📅 2026-05-29

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This study addresses a critical oversight in existing research—the confounding effect of output length on uncertainty analysis in language models, which hinders accurate assessment of their information transmission efficiency. To resolve this, the authors propose Canopy Entropy (CE*), a novel metric that explicitly incorporates output length into an information-theoretic framework by jointly modeling the entropy of both sequence content and length. Additionally, they introduce a length–entropy rate correlation measure to quantify how fine-tuning reshapes uncertainty structure. Experimental results demonstrate that fine-tuning not only reduces overall entropy but also substantially strengthens the positive correlation between output length and entropy rate. Crucially, after controlling for length, the correlation between entropy rate and semantic diversity nearly triples, confirming that fine-tuning more effectively translates token-level uncertainty into meaningful semantic variation.

📝 Abstract

Fine-tuning is often believed to reduce uncertainty and diversity in large language models, but existing analyses overlook output length, a key confounder, and therefore fail to capture how uncertainty is distributed across an entire generation rollout. To address this, we propose Canopy Entropy ($\mathrm{CE}^\star$), a measure that views language generation from a tree perspective, where ``canopy'' represents the space of all possible rollouts, making $\mathrm{CE}^\star$ naturally quantify the effective size of the generation space. $\mathrm{CE}^\star$ jointly captures uncertainty in both the output length $N$ and the generated sequence $Y_{1:N}$ -- indeed, we show that it equals to total Shannon entropy $H(N, Y_{1:N}\mid X)$, where $X$ denotes the prompt. This formulation yields interpretable metrics, including a length-entropy correlation term $ρ(N, r_N)$, where $r_N$ is the entropy rate, quantifying information conveyance efficiency by indicating whether longer outputs are more or less informative per token. Empirically, across tasks and model families, we find that fine-tuned models consistently exhibit stronger positive correlation $ρ(N, r_N)$, even when total entropy decreases. Furthermore, after controlling for model family, task, prompt, and output-length effects, we find that fine-tuning nearly triples the correlation strength between entropy rate and semantic diversity, suggesting that aligned models convert token uncertainty into semantic diversity more efficiently. Overall, these results demonstrate that fine-tuning does not simply reduce uncertainty, but fundamentally reorganizes it into more informative and semantically meaningful generations. Our code is available at https://github.com/WeiyiTian/canopy-entropy.

Problem

Research questions and friction points this paper is trying to address.

fine-tuning

uncertainty

output length

information conveyance

semantic diversity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Canopy Entropy

fine-tuning

information conveyance