🤖 AI Summary
Standard stochastic interpolators (SIs) require direct access to samples from both marginal distributions, making them incompatible with end-to-end joint optimization in latent variable models. Method: We propose the Latent-space Stochastic Interpolator (LSI), which performs encoder–interpolation–decoder operations entirely within the latent space, enabling fully differentiable, end-to-end training. LSI extends the SI framework to latent variable modeling for the first time. Contribution/Results: We derive a rigorously optimizable evidence lower bound (ELBO) in continuous time, eliminating reliance on standard Gaussian priors and supporting arbitrary prior distributions. On ImageNet image generation, LSI substantially reduces computational overhead in high-dimensional observation spaces while preserving the generative flexibility and sample quality inherent to the SI framework—demonstrating both effectiveness and broad applicability.
📝 Abstract
Stochastic Interpolants (SI) are a powerful framework for generative modeling, capable of flexibly transforming between two probability distributions. However, their use in jointly optimized latent variable models remains unexplored as they require direct access to the samples from the two distributions. This work presents Latent Stochastic Interpolants (LSI) enabling joint learning in a latent space with end-to-end optimized encoder, decoder and latent SI models. We achieve this by developing a principled Evidence Lower Bound (ELBO) objective derived directly in continuous time. The joint optimization allows LSI to learn effective latent representations along with a generative process that transforms an arbitrary prior distribution into the encoder-defined aggregated posterior. LSI sidesteps the simple priors of the normal diffusion models and mitigates the computational demands of applying SI directly in high-dimensional observation spaces, while preserving the generative flexibility of the SI framework. We demonstrate the efficacy of LSI through comprehensive experiments on the standard large scale ImageNet generation benchmark.