🤖 AI Summary
This study addresses the challenge that autoregressive models face in accurately modeling conditional distributions under global structural constraints—such as rhyme schemes, fixed metrical patterns, or fill-in-the-blank templates—leading to incomplete coverage of the solution space and distorted probabilities. The work formally defines the task of exact inference under such constraints and establishes its computational complexity: sentence-level MAP decoding is shown to be NP-hard, while normalized sampling under regular-language constraints is #P-hard. By integrating computational complexity theory with polynomial-time computable next-token probability models, the paper demonstrates the infeasibility of dynamic programming for these problems and clarifies their fundamental distinction from finite-state Markov models. The analysis delineates a theoretical boundary wherein local sampling remains tractable, yet exact inference under global constraints is generally intractable.
📝 Abstract
Large language and music models are increasingly used for constrained generation: rhyming lines, fixed meter, inpainting or infilling, positional endings, and other global form requirements. These systems often perform strikingly well, but the induced procedures are usually not exact conditioning of the underlying autoregressive model. This creates a hidden inferential bias, distinct from the better-known notion of bias inherited from the training set: samples are distorted relative to the true constrained distribution, with no generic guarantee of complete coverage of the admissible solution space or of correct conditional probabilities over valid completions. We formalize several exact inference tasks for autoregressive models and prove corresponding hardness results. For succinctly represented autoregressive models whose next-token probabilities are computable in polynomial time, exact sentence-level maximum a posteriori (MAP) decoding is NP-hard. This hardness persists under unary and metrical constraints. On the sampling side, exact conditioned normalization is \#P-hard even for regular constraints such as fixed-length terminal events. Unlike finite-state Markov models, general autoregressive models do not admit a bounded-state dynamic program for these tasks. These results formalize a standard claim in the neural decoding literature: local autoregressive sampling is easy, whereas exact decoding and exact conditioning under global form constraints are computationally intractable in general.