Markov Chain Decoders Overcome the Heavy-Tail Limitations of Lipschitz Generative Models

📅 2026-05-18

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This work addresses the limitation of standard variational autoencoders in modeling heavy-tailed distributions, which arises from the use of Gaussian decoders and implicit Lipschitz constraints. To overcome this structural restriction, the authors propose, for the first time, incorporating Phase-Type (PH) distributions into the decoder. Leveraging the underlying Markovian representation of PH distributions, the method preserves the original encoder architecture and training procedure while effectively capturing heavy-tailed characteristics. The approach is theoretically grounded and empirically effective: on synthetic Pareto-distributed data, it reduces the tail Kolmogorov–Smirnov distance by up to sixfold and decreases extreme quantile estimation error by as much as tenfold compared to Gaussian baselines.

📝 Abstract

Heavy-tailed distributions are prevalent in performance evaluation, network traffic, and risk modeling. This behavior poses a fundamental challenge for modern deep generative models. Standard Variational Autoencoders (VAEs) employ Gaussian decoder likelihoods and Lipschitz-constrained neural networks, a combination that is structurally incapable of producing heavy-tailed outputs: the Gaussian tail decays exponentially, and Lipschitz continuity prevents the decoder from amplifying rare events from the latent space input to sufficiently overcome this decay. We provide both a theoretical characterization of this limitation and a controlled empirical demonstration using synthetic Pareto data across a grid of tail indices $α$ $\in$ {2, 3, 5, 30} and dimensions d $\in$ {1, 5, 10}. As a solution, we replace the Gaussian decoder with a Phase-Type (PH) distribution based on Markov chains, while keeping the encoder, latent space, and training procedure identical. PH distributions allow for arbitrarily precise approximations of any positive-valued distributions, including heavy-tailed families. Experiments showed that the PH-based model reduces tail Kolmogorov-Smirnov distance by up to x6 and extreme quantile error by up to x10 compared to the Gaussian baseline for heavy-tailed data. These results demonstrate that integrating Markov chain-based distributions into the decoder of a generative model institutes a principled and practically effective solution to the heavy-tail generation problem.

Problem

Research questions and friction points this paper is trying to address.

heavy-tailed distributions

generative models

Variational Autoencoders

Lipschitz continuity

tail modeling

Innovation

Methods, ideas, or system contributions that make the work stand out.

Markov Chain Decoders

Heavy-tailed Distributions

Phase-Type Distributions