Hands-On Generative Adversarial Networks with Keras
上QQ阅读APP看书,第一时间看更新

Reversible flows

Reversible flow models have been gaining attention in the research community, given their impressive results in image and speech synthesis. Flow-based generative models are reversible in nature, and a single model  with a parameter   estimates both the conditional probability of the data   given the latent vector  , , and the probability of the  latent vector given the  data, . This characteristic of reversible flow models enables exact latent-variable inference and log-likelihood evaluation without any approximation, thus representing an advantage over VAEs.

Reversible flow models are comprised of multiple layers of flow. Each flow layer uses a neural network that is not reversible to compute the parameters of an affine coupling layer that scales and translates the data. Since scaling and translation are reversible operations, each flow layer is a bijection assuring the following: 

This brings with it another very nice property: by restricting each flow layer to be bijective, the likelihood can be calculated directly by using the change of variables. Without loss of generality, let's assume the latent variable   is a spherical Gaussian, and describe the maximum likelihood in the context of a reversible flow model:

The first term is the log-likelihood of a spherical Gaussian, and it penalizes the norm. The second term arises from the change of variables, and is the Jacobian. The log-determinant of the Jacobian rewards any layer for increasing the volume of the space during the forward pass.

A closer look to the preceding equation shows that, unlike VAEs, by being bijective, reversible flow models allow for the direct computation of the evidence , which should be closer to the true evidence than computing an ELBO.

Similar to VAEs, reversible flow models provide efficient training and inference in parallel. In addition, and similar to VAEs, flow-based models allow for manipulations of the latent space, including interpolations between data points or conditional synthesis based on latent space clustering.

Glow is one of the best models for image generation using flow-based models. The following figure shows the faces generated by the model. We invite the reader to compare this sample to the other samples provided in this chapter:

Source: Glow: Generative Flow with Invertible 1x1 Convolutions ( https://arxiv.org/abs/1807.03039)

Researchers at NVIDIA have applied flow-based generative models to speech synthesis. Refer to the paper in the following link for further reference: https://arxiv.org/abs/1811.00002