🤖 AI Summary
Existing locally constrained decoding methods for structured text generation (e.g., JSON) suffer from sampling bias due to myopic token masking. This work proposes Global Constrained Decoding (GCD) and its probabilistic extension (P-GCD), which, for the first time, integrate tensorized finite automata with Hidden Markov Model (HMM) circuits. By leveraging circuit multiplication to fuse logical constraints with language model probabilities, the approach constructs an efficient, low-bias proposal distribution for Sequential Monte Carlo (SMC). The method is amenable to GPU acceleration and demonstrates superior performance across function calling, keyword generation, and SQL generation tasks, achieving faster convergence to the target distribution with fewer particles compared to baseline approaches.
📝 Abstract
Generations from large language models often fail to conform to desired constraints such as JSON schema. Existing locally constrained decoding (LCD) approaches enforce constraints by myopically masking out next tokens, resulting in biased sampling and degradation in performance. Recent work uses sequential Monte Carlo (SMC) methods to mitigate such biases, but designing effective proposal distributions or potential functions remains a key challenge. In this work, we propose a generic approach to construct proposals and potentials for SMC sampling from $p_{\mathrm{lm}}( \cdot \mid \mathrm{constraint})$. First, we show that constraints specified as finite automata can be tensorized for efficient execution on GPUs, which we use to construct globally constrained decoding (GCD) proposals. In addition, leveraging the fact that tensorized finite automata share the same circuit structure as hidden Markov models, we circuit-multiply them to obtain the probabilistic GCD (P-GCD) proposals encoding both logical and probabilistic information about the target distributions. We evaluate (P-)GCD on the tasks of function calling, keyword-based generation, and SQL generation. Experiments show that under the same SMC sampling setup, compared to LCD proposals, (P-)GCD converges faster to the target distribution with significantly fewer particles.