GUARD: Glocal Uncertainty-Aware Robust Decoding for Effective and Efficient Open-Ended Text Generation

📅 2025-08-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In open-ended text generation, balancing coherence and diversity remains challenging; existing contrastive search decoding methods rely on sensitive hyperparameters and incur high computational overhead. This paper proposes “Glocal,” an adaptive decoding framework that jointly models global entropy (capturing overall uncertainty) and local entropy deviation (reflecting local instability), with theoretical guarantees of unbiasedness and consistency in estimation. Additionally, we introduce a lightweight token-counting penalty mechanism to significantly reduce computational cost. Glocal eliminates the need for manual hyperparameter tuning while preserving generation quality and accelerating inference. Experiments across multiple benchmarks demonstrate that Glocal consistently outperforms mainstream decoding strategies—including greedy, beam search, top-k, nucleus sampling, and contrastive search—in diversity, coherence, and holistic quality. Results are rigorously validated through both human evaluation and LLM-based automatic assessment.

Technology Category

Application Category

📝 Abstract
Open-ended text generation faces a critical challenge: balancing coherence with diversity in LLM outputs. While contrastive search-based decoding strategies have emerged to address this trade-off, their practical utility is often limited by hyperparameter dependence and high computational costs. We introduce GUARD, a self-adaptive decoding method that effectively balances these competing objectives through a novel "Glocal" uncertainty-driven framework. GUARD combines global entropy estimates with local entropy deviations to integrate both long-term and short-term uncertainty signals. We demonstrate that our proposed global entropy formulation effectively mitigates abrupt variations in uncertainty, such as sudden overconfidence or high entropy spikes, and provides theoretical guarantees of unbiasedness and consistency. To reduce computational overhead, we incorporate a simple yet effective token-count-based penalty into GUARD. Experimental results demonstrate that GUARD achieves a good balance between text diversity and coherence, while exhibiting substantial improvements in generation speed. In a more nuanced comparison study across different dimensions of text quality, both human and LLM evaluators validated its remarkable performance. Our code is available at https://github.com/YecanLee/GUARD.
Problem

Research questions and friction points this paper is trying to address.

Balancing coherence and diversity in text generation
Reducing hyperparameter dependence and computational costs
Mitigating abrupt uncertainty variations in model outputs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Glocal uncertainty-driven self-adaptive decoding framework
Combines global entropy with local entropy deviations
Token-count-based penalty reduces computational overhead
🔎 Similar Papers
No similar papers found.