AE-OT-GAN: Training GANs from data specific latent distribution

D An, Y Guo, M Zhang, X Qi, N Lei, X Gu - Computer Vision–ECCV 2020 …, 2020 - Springer
Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28 …, 2020Springer
Though generative adversarial networks (GANs) are prominent models to generate realistic
and crisp images, they are unstable to train and suffer from the mode collapse problem. The
problems of GANs come from approximating the intrinsic discontinuous distribution
transform map with continuous DNNs. The recently proposed AE-OT model addresses the
discontinuity problem by explicitly computing the discontinuous optimal transform map in the
latent space of the autoencoder. Though have no mode collapse, the generated images by …
Abstract
Though generative adversarial networks (GANs) are prominent models to generate realistic and crisp images, they are unstable to train and suffer from the mode collapse problem. The problems of GANs come from approximating the intrinsic discontinuous distribution transform map with continuous DNNs. The recently proposed AE-OT model addresses the discontinuity problem by explicitly computing the discontinuous optimal transform map in the latent space of the autoencoder. Though have no mode collapse, the generated images by AE-OT are blurry. In this paper, we propose the AE-OT-GAN model to utilize the advantages of the both models: generate high quality images and at the same time overcome the mode collapse problems. Specifically, we firstly embed the low dimensional image manifold into the latent space by autoencoder (AE). Then the extended semi-discrete optimal transport (SDOT) map is used to generate new latent codes. Finally, our GAN model is trained to generate high quality images from the latent distribution induced by the extended SDOT map. The distribution transform map from this dataset related latent distribution to the data distribution will be continuous, and thus can be well approximated by the continuous DNNs. Additionally, the paired data between the latent codes and the real images gives us further restriction about the generator and stabilizes the training process. Experiments on simple MNIST dataset and complex datasets like CIFAR10 and CelebA show the advantages of the proposed method.
Springer