🤖 AI Summary
Existing decoding strategies—such as Top-k and nucleus sampling—are empirically effective but lack theoretical grounding, whereas theoretically optimal approaches like maximum a posteriori (MAP) decoding underperform in practice, revealing a fundamental theory-practice gap. Method: This paper introduces the first minimax game-theoretic framework for text generation, modeling decoding as a zero-sum game between a strategic agent and an adversarial “nature” that perturbs the model’s output distribution. Contribution/Results: Within this framework, we prove that greedy search and temperature scaling are near-optimal under specific adversarial regularization; truncation-and-renormalization emerges naturally as implicit regularization; and Top-k and nucleus sampling are rigorously shown to be first-order optimal approximations. We derive a closed-form, single-step optimal strategy analytically and validate its alignment with empirical decoding behavior through numerical experiments.
📝 Abstract
Decoding strategies play a pivotal role in text generation for modern language models, yet a puzzling gap divides theory and practice. Surprisingly, strategies that should intuitively be optimal, such as Maximum a Posteriori (MAP), often perform poorly in practice. Meanwhile, popular heuristic approaches like Top-$k$ and Nucleus sampling, which employ truncation and normalization of the conditional next-token probabilities, have achieved great empirical success but lack theoretical justifications. In this paper, we propose Decoding Game, a comprehensive theoretical framework which reimagines text generation as a two-player zero-sum game between Strategist, who seeks to produce text credible in the true distribution, and Nature, who distorts the true distribution adversarially. After discussing the decomposibility of multi-step generation, we derive the optimal strategy in closed form for one-step Decoding Game. It is shown that the adversarial Nature imposes an implicit regularization on likelihood maximization, and truncation-normalization methods are first-order approximations to the optimal strategy under this regularization. Additionally, by generalizing the objective and parameters of Decoding Game, near-optimal strategies encompass diverse methods such as greedy search, temperature scaling, and hybrids thereof. Numerical experiments are conducted to complement our theoretical analysis.