Exploring Sparse MoE in GANs for Text-conditioned Image Synthesis

Jiapeng ZHU, Ceyuan YANG, Kecheng ZHENG, Yinghao XU, Zifan SHI, Zhang Yifei, Qifeng CHEN, Yujun SHEN

Research output: Contribution to journalConference article published in journalpeer-review

1 Citation (Scopus)

Abstract

Due to the difficulty in scaling up, generative adversarial networks (GANs) seem to be falling out of grace with the task of text-conditioned image synthesis. Sparsely activated mixture-of-experts (MoE) has recently been demonstrated as a valid solution to training large-scale models with limited resources. Inspired by this, we present Aurora, a GAN-based text-to-image generator that employs a collection of experts to learn feature processing, together with a sparse router to adaptively select the most suitable expert for each feature point. We adopt a two-stage training strategy, which first learns a base model at 64 × 64 resolution followed by an upsampler to produce 512 × 512 images. Trained with only public data, our approach encouragingly closes the performance gap between GANs and industry-level diffusion models, maintaining a fast inference speed. We release the code and checkpoints here to facilitate the community for further development.

Original languageEnglish
Article number11093014
Pages (from-to)18411-18423
Number of pages13
JournalProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
DOIs
Publication statusPublished - 13 Aug 2025
Event2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2025 - Nashville, United States
Duration: 11 Jun 202515 Jun 2025

Bibliographical note

Publisher Copyright:
© 2025 IEEE.

Fingerprint

Dive into the research topics of 'Exploring Sparse MoE in GANs for Text-conditioned Image Synthesis'. Together they form a unique fingerprint.

Cite this