Diffusion Models Beat GANs on Image Synthesis - A Summary

Terminologies and Pre-requisite knowledge

Generative Adversarial Networks(GAN)
Fréchet Inception Distance(FID) - metric to assess the quality of images generated by a GAN. Unlike Inception Distance, it compares distribution of generated images with real images by training a standard network like Inception on real images and generated images and then comparing the Gaussian Distribution parameters of the deeper CNN layers
Likelihood based Generative Models - class of generative models that model the distribution of data using a likelihood function. For instance, variational autoencoders.

GANs capture less diversity than state-of-the-art likelihood-based models, are difficult to train, scale and apply to new domains.
Likelihood based Generative Models capture more diversity, but are not able to produce high quality images.
Diffusion models - a class of likelihood-based models, shown to produce high-quality images, while offering desirable properties such as distribution coverage, a stationary training objective, and easy scalability.
Hypothesis -
1. Model architectures used by recent GAN literature have been heavily explored and refined.
2. GANs are able to trade off diversity for quality, producing high quality samples but not covering the whole distribution.
Aim - In diffusion models, improve model architecture and then devise a scheme to trade off diversity for quality.