🤖 AI Summary
This paper addresses unsupervised representation learning for sequential data. We propose a novel probabilistic flow decomposition framework that disentangles the latent-space dynamics into two orthogonal vector fields: a sparse curl-free field (corresponding to an irrotational potential field) and a divergence-free field (corresponding to a solenoidal rotational field), with sparsity priors newly imposed on both components. Within a variational autoencoder framework, our method jointly optimizes representation encoding, velocity field estimation, and field-structure inference, implicitly learning approximately equivariant representations. Compared to prior approaches, our model simultaneously achieves static representation disentanglement and independence of dynamic transformation primitives, yielding significant improvements in data likelihood and unsupervised equivariance error across multiple sequence transformation benchmarks—achieving state-of-the-art performance. Crucially, the learned vector fields admit clear physical interpretations grounded in classical vector calculus.
📝 Abstract
There is a vast literature on representation learning based on principles such as coding efficiency, statistical independence, causality, controllability, or symmetry. In this paper we propose to learn representations from sequence data by factorizing the transformations of the latent variables into sparse components. Input data are first encoded as distributions of latent activations and subsequently transformed using a probability flow model, before being decoded to predict a future input state. The flow model is decomposed into a number of rotational (divergence-free) vector fields and a number of potential flow (curl-free) fields. Our sparsity prior encourages only a small number of these fields to be active at any instant and infers the speed with which the probability flows along these fields. Training this model is completely unsupervised using a standard variational objective and results in a new form of disentangled representations where the input is not only represented by a combination of independent factors, but also by a combination of independent transformation primitives given by the learned flow fields. When viewing the transformations as symmetries one may interpret this as learning approximately equivariant representations. Empirically we demonstrate that this model achieves state of the art in terms of both data likelihood and unsupervised approximate equivariance errors on datasets composed of sequence transformations.