Diffusion Models Beat GANs on Image Synthesis - A Summary
Terminologies and Pre-requisite knowledge
- Generative Adversarial Networks(GAN)
- Fréchet Inception Distance(FID) - metric to assess the quality of images generated by a GAN. Unlike Inception Distance, it compares distribution of generated images with real images by training a standard network like Inception on real images and generated images and then comparing the Gaussian Distribution parameters of the deeper CNN layers
- Likelihood based Generative Models - class of generative models that model the distribution of data using a likelihood function. For instance, variational autoencoders.
Introduction
- GANs capture less diversity than state-of-the-art likelihood-based models, are difficult to train, scale and apply to new domains.
- Likelihood based Generative Models capture more diversity, but are not able to produce high quality images.
- Diffusion models - a class of likelihood-based models, shown to produce high-quality images, while offering desirable properties such as distribution coverage, a stationary training objective, and easy scalability.
- Hypothesis -
- Model architectures used by recent GAN literature have been heavily explored and refined.
- GANs are able to trade off diversity for quality, producing high quality samples but not covering the whole distribution.
- Aim - In diffusion models, improve model architecture and then devise a scheme to trade off diversity for quality.
Architecture improvements
- Existing architecture(Diffusion model) - UNet Architecture