The End of Manual Decoding: Towards Truly End-to-End Language Models

📅 2025-10-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing large language models (LLMs), though often described as “end-to-end,” rely on non-differentiable posterior decoding strategies—such as fixed temperature and top-p sampling—that require manual hyperparameter tuning, hindering true end-to-end optimization. To address this, we propose AutoDeco: the first differentiable decoding framework that intrinsically integrates dynamic decoding control into the model architecture. Built upon standard Transformers, AutoDeco extends the architecture with a lightweight prediction head that jointly outputs token logits and token-level adaptive temperature and top-p parameters, enabling instruction-driven, fine-grained decoding control. Evaluated across eight benchmark tasks, AutoDeco significantly outperforms fixed-parameter decoding and approaches the performance of oracle models tuned on test sets—achieving, for the first time, fully end-to-end text generation without human intervention or post-hoc tuning.

Technology Category

Application Category

📝 Abstract
The "end-to-end" label for LLMs is a misnomer. In practice, they depend on a non-differentiable decoding process that requires laborious, hand-tuning of hyperparameters like temperature and top-p. This paper introduces AutoDeco, a novel architecture that enables truly "end-to-end" generation by learning to control its own decoding strategy. We augment the standard transformer with lightweight heads that, at each step, dynamically predict context-specific temperature and top-p values alongside the next-token logits. This approach transforms decoding into a parametric, token-level process, allowing the model to self-regulate its sampling strategy within a single forward pass. Through extensive experiments on eight benchmarks, we demonstrate that AutoDeco not only significantly outperforms default decoding strategies but also achieves performance comparable to an oracle-tuned baseline derived from "hacking the test set"-a practical upper bound for any static method. Crucially, we uncover an emergent capability for instruction-based decoding control: the model learns to interpret natural language commands (e.g., "generate with low randomness") and adjusts its predicted temperature and top-p on a token-by-token basis, opening a new paradigm for steerable and interactive LLM decoding.
Problem

Research questions and friction points this paper is trying to address.

Eliminates manual hyperparameter tuning in language model decoding
Enables dynamic self-regulation of decoding strategies during generation
Introduces instruction-based control over sampling randomness in LLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Learns context-specific decoding parameters dynamically
Enables token-level self-regulation in single forward pass
Interprets natural language commands for decoding control
🔎 Similar Papers
No similar papers found.