🤖 AI Summary
This work addresses the challenge that large language models struggle to produce structured outputs during natural generation, while conventional constrained decoding often impairs their reasoning capabilities. To reconcile this trade-off, the authors propose a trigger-token-based two-stage unified decoding framework: in the first stage, the model retains full expressive power through unconstrained generation; upon detecting predefined trigger tokens, it dynamically switches to a structured generation mode. This approach achieves, for the first time, seamless integration of free-form reasoning and constrained output generation. Evaluated across multiple classification and reasoning tasks, the method improves accuracy by up to 27% while incurring minimal computational overhead—only 10–20 additional tokens—thus effectively balancing generative flexibility with output reliability.
📝 Abstract
Natural generation allows Language Models (LMs) to produce free-form responses with rich reasoning, but the lack of guaranteed structure makes outputs difficult to parse or verify. Structured generation, or constrained decoding, addresses this drawback by producing content in standardized formats such as JSON, ensuring consistency and guaranteed-parsable outputs, but it can inadvertently restrict the model's reasoning capabilities. In this work, we propose a simple approach that combines the advantages of both natural and structured generation. By allowing LLMs to reason freely until specific trigger tokens are generated, and then switching to structured generation, our method preserves the expressive power of natural language reasoning while ensuring the reliability of structured outputs. We further evaluate our approach on several datasets, covering both classification and reasoning tasks, to demonstrate its effectiveness, achieving a substantial gain of up to 27% in accuracy compared to natural generation, while requiring only a small overhead of 10-20 extra tokens.