🤖 AI Summary
Existing insertion-based language models are largely heuristic and lack a unified theoretical foundation. This work addresses this gap by deriving a principled framework from first principles, modeling the noising process of variable-length sequences as a continuous-time Markov chain. The resulting diffusion-based denoising framework unifies prior insertion methods as special cases while substantially enhancing sampling flexibility. Empirical evaluations demonstrate that the proposed approach outperforms both left-to-right autoregressive models and mask-based diffusion models on synthetic planning tasks, achieves comparable performance on standard language modeling benchmarks, and enables more flexible generation strategies through its generalized formulation.
📝 Abstract
Insertion Language Models (ILMs) offer several advantages over left-to-right generation and mask-based generation. However, existing formulations of insertion-based generation have largely been ad-hoc. In this paper, we derive a diffusion-style denoising objective for ILMs from first principles by formulating the noising process as a continuous-time Markov chain on the space of variable-length sequences. We show that previous formulations of ILMs can be viewed as special cases of this denoising framework. Through empirical evaluation on a synthetic planning task, we show that the proposed approach retains the benefits of insertion-based generation over left-to-right generation and masked diffusion models. In language modeling, our diffusion-based approach is competitive with left-to-right generation and masked diffusion models, while offering additional flexibility in sampling compared to existing insertion language models.