🤖 AI Summary
This work addresses a critical challenge in information extraction from large language models: their freely generated outputs often lack consistent formatting, making reliable answer extraction difficult. To tackle this issue, the paper introduces suffix constraints into the generation process of causal language models for the first time, proposing a greedy search–based constrained generation algorithm. By enforcing model outputs to conform to predefined templates, the method enables deterministic parsing of answers while preserving the autoregressive inference mechanism. Experimental results across multiple datasets demonstrate that this approach not only guarantees parseable outputs but also maintains or even improves model accuracy on tasks such as mathematical question answering.
📝 Abstract
Large language models (LLMs) are powerful tools that have found applications beyond human-machine interfaces and chatbots. In particular, their ability to generate reasoning traces motivated their use in many prediction tasks like math question answering. Unfortunately, extracting the final answer in an LLM free-form output is difficult, as it is an information extraction problem on its own.
In this work, we introduce suffix-constrained generation, that aims to produce well-formed LLM responses in which final answers follow strict templates and are guaranteed to be trivially parseable. To this end, we introduce several algorithms that are based on greedy search procedures. We experiment on several datasets, and show that our approach allows to guarantee trivial deterministic extraction of the final answer from an LLM output without having a negative impact on results, and even improving them.