🤖 AI Summary
This work addresses the problem of exactly generating sequences that conform to variable-order Markov models while satisfying regular constraints—such as fixed positions and forbidden repeated substrings. To this end, we introduce the first extension of belief propagation to joint inference over variable-order Markov models and constraint automata. By constructing context states via a sparse product representation, our method preserves historical distinguishability without explicitly enumerating all K-tuples. This approach enables reversible data augmentation and supports exact forward inference and backward counting queries in time linear in sequence length. Moreover, it generates constrained sequences in polynomial time without requiring storage of transformed corpora, thereby reproducing the effects of conventional data augmentation techniques while maintaining computational and memory efficiency.
📝 Abstract
Variable-order Markov models generate sequences over a finite alphabet by conditioning each symbol on the longest available suffix of the generated history. Regular constraints, by contrast, describe finite-horizon control requirements by an automaton: fixed positions, forced endings, metrical patterns, and forbidden copied fragments are all special cases. Existing exact methods already handle regular constraints with belief propagation for first-order Markov chains. The contribution here is the variable-order extension: identifying the state space on which the existing BP-regular machinery must be run when the generator is a variable-order/backoff model. A first-order constraint layer can enforce useful support conditions, but it computes future mass after merging histories that a variable-order generator deliberately keeps distinct. We formalize this mismatch and give the sparse construction obtained by replacing the first-order Markov state with the observed context state, then taking the standard product with the regular constraint automaton. For a fixed trained context graph and automaton, inference is linear in the sequence horizon; in general it is polynomial in the number of reachable product edges. This gives the correct variable-order distribution conditioned on regular constraints without expanding to all K-tuples. The same finite-source interface supports reversible data augmentation by inverse count lookup, matching materialized transposition augmentation without storing transformed corpora. We also separate exact BP inference from generation-time backoff policies, such as singleton avoidance, whose stochastic semantics must be made explicit if exactness is claimed.