Diffusion2GAN: Distilling Diffusion Models into Conditional GANs

We propose a method to distill a complex multistep diffusion model into a single-step conditional GAN student model, dramatically accelerating inference while preserving image quality. Our approach interprets the diffusion distillation as a paired image-to-image translation task, using noise-to-image pairs of the diffusion model's ODE trajectory. For efficient regression loss computation, we propose E-LatentLPIPS, a Perceptual loss in the latent space of the diffusion model with an ensemble of augmentations. Despite dataset construction costs, E-LatentLPIPS converges more efficiently than many existing distillation methods. Furthermore, we adapt a diffusion model to construct a multi-scale discriminator with a text alignment loss to build an effective conditional GAN-based formulation. We demonstrate that our one-step generator outperforms cutting-edge one-step diffusion distillation models, DMD, SDXL-Turbo, and SDXL-Lightning, on the zero-shot COCO benchmark.

Learn More

Publications

Diffusion2GAN: Distilling Diffusion Models into Conditional GANs

European Conference on Computer Vision (ECCV'24)

Publication date: October 3, 2024

Minguk Kang, Richard Zhang, Connelly Barnes, Sylvain Paris, Suha Kwak, Jaesik Park, Eli Shechtman, Jun-Yan Zhu, Taesung Park

Research Areas: AI & Machine Learning Computer Vision, Imaging & Video