GRN: Generative Refinement Networks
This is the official implementation of the paper Generative Refinement Networks for Visual Synthesis. Neither diffusion nor autoregressive β GRN is a third way. π§ Refines globally like an artist. β‘ Generates adaptively by complexity. π New SOTA across image & video. The visual generation paradigm just got rewritten.
π Table of Contents
- π Introduction
- οΏ½ Demo
- οΏ½ Open-Source Plan
- π οΈ Installation
- πΌοΈ Class-to-Image
- π§ Contact
- π€ Acknowledgements
- π Citation
π Demo
Try our interactive Text-to-Image demo on π€ Hugging Face Space:
Experience the power of Generative Refinement Networks firsthand by generating images from text prompts directly in your browser!
π Introduction
Diffusion models dominate visual generation but they allocate uniform computational effort to samples with varying levels of complexity. Autoregressive (AR) models are complexity-aware, as evidenced by their variable likelihoods, but suffer from lossy tokenization and error accumulation.
We introduce Generative Refinement Networks (GRN), a new visual synthesis paradigm that addresses these issues:
- Near-lossless tokenization via Hierarchical Binary Quantization (HBQ)
- Global refinement mechanism that progressively perfects outputs like a human artist
- Entropy-guided sampling for complexity-aware, adaptive-step generation
GRN achieves state-of-the-art results on ImageNet reconstruction and class-conditional generation, and scales effectively to text-to-image and text-to-video tasks.
Starting from a random token map, GRN randomly selects more predictions at each step and refines all input tokens. For example, compared to the second step, the third step filled six new tokens (pink), kept two tokens (blue), erased two tokens (yellow), and left six tokens blank (gray).
Open-Source Plan
GRN adopts a minimalist and self-contained design. This implementation is in PyTorch + GPU.
| Task | Checkpoints | Inference Code | Training Code |
|---|---|---|---|
| T2V | β¬ | β¬ | β |
| T2I | β¬ | β¬ | β |
| C2I | β¬ | β | β |
π οΈ Installation
Step 1: Clone the repository
git clone https://github.com/MGenAI/GRN
cd GRN
Step 2: Create conda environment
A suitable conda environment named GRN can be created and activated with:
conda env create -f environment.yaml
conda activate GRN
Troubleshooting
If you get undefined symbol: iJIT_NotifyEvent when importing torch, simply:
pip uninstall torch
pip install torch==2.5.1 --index-url https://download.pytorch.org/whl/cu124
Check this issue for more details.
πΌοΈ Class-to-Image
Dataset
Download ImageNet dataset, and place it in your IMAGENET_PATH.
Training
All training scripts are located in scripts/c2i/. We suggest using 8x80GB GPUs for most models.
| Model | Training Script | GPUs Required |
|---|---|---|
| GRN_ind_B | bash scripts/c2i/train_GRN_ind_B.sh |
8x80GB |
| GRN_bit_B | bash scripts/c2i/train_GRN_bit_B.sh |
8x80GB |
| GRN_ind_L | bash scripts/c2i/train_GRN_ind_L.sh |
8x80GB |
| GRN_ind_H | bash scripts/c2i/train_GRN_ind_H.sh |
16x80GB |
| GRN_ind_G | bash scripts/c2i/train_GRN_ind_G.sh |
32x80GB |
Evaluation
PyTorch pre-trained models are available here.
All evaluation scripts are located in scripts/c2i/. We suggest using 8x80GB vRAM GPUs.
| Model | Evaluation Script |
|---|---|
| GRN_ind_B | bash scripts/c2i/eval_GRN_ind_B.sh |
| GRN_bit_B | bash scripts/c2i/eval_GRN_bit_B.sh |
| GRN_ind_L | bash scripts/c2i/eval_GRN_ind_L.sh |
| GRN_ind_H | bash scripts/c2i/eval_GRN_ind_H.sh |
| GRN_ind_G | bash scripts/c2i/eval_GRN_ind_G.sh |
We use torch-fidelity to evaluate FID and IS against a reference image folder or statistics. We use the JiT's pre-computed reference stats under grn/utils_c2i/fid_stats.
π§ Contact
If you are interested in scaling GRN for image generation / image editing / video generation / video editing / unified model directions, please feel free to reach out!
π§ Email: hanjian.thu123@bytedance.com
π€ Acknowledgements
- Thanks to JiT, Infinity and InfinityStar for their wonderful work and codebase!
π Citation
If you find our work useful, please consider citing:
@misc{han2026grn,
title={Generative Refinement Networks for Visual Synthesis},
author={Jian Han and Jinlai Liu and Jiahuan Wang and Bingyue Peng and Zehuan Yuan},
year={2026},
eprint={2604.13030},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2604.13030},
}