BitDance-14B-64x (Diffusers)

Diffusers-converted checkpoint for BitDance-14B-64x with bundled custom pipeline code (bitdance_diffusers) for direct loading with DiffusionPipeline.

Quickstart (native diffusers)

import torch
from diffusers import DiffusionPipeline

# Local path (recommended - no trust_remote_code needed)
model_path = "BiliSakura/BitDance-14B-64x-diffusers"
pipe = DiffusionPipeline.from_pretrained(
    model_path,
    custom_pipeline=model_path,
    torch_dtype=torch.bfloat16,
).to("cuda")

result = pipe(
    prompt = "A close-up portrait in a cinematic photography style, capturing a girl-next-door look on a sunny daytime urban street. She wears a khaki sweater, with long, flowing hair gently draped over her shoulders. Her head is turned slightly, revealing soft facial features illuminated by realistic, delicate sunlight coming from the left. The sunlight subtly highlights individual strands of her hair. The image has a Canon film-like color tone, evoking a warm nostalgic atmosphere.",
    height=1024,
    width=1024,
    num_inference_steps=50,
    guidance_scale=7.5,
)
result.images[0].save("bitdance_14b_64x.png")

Test Running

Run tests from the model directory in your active Python environment:

python test_bitdance.py

VRAM Usage by Resolution

Measured on NVIDIA A100-SXM4-80GB using:

  • dtype=torch.bfloat16
  • num_inference_steps=30
  • guidance_scale=7.5
  • prompt: A close-up portrait in a cinematic photography style, capturing a girl-next-door look on a sunny daytime urban street. She wears a khaki sweater, with long, flowing hair gently draped over her shoulders. Her head is turned slightly, revealing soft facial features illuminated by realistic, delicate sunlight coming from the left. The sunlight subtly highlights individual strands of her hair. The image has a Canon film-like color tone, evoking a warm nostalgic atmosphere.
Resolution Peak Allocated VRAM (GiB) Peak Reserved VRAM (GiB) Time (s) Status
512x512 39.60 40.62 4.08 ok
1024x1024 41.21 50.15 15.79 ok
1280x768 40.88 49.52 14.78 ok
768x1280 40.88 49.52 14.75 ok
1536x640 40.88 49.52 14.76 ok
2048x512 41.21 50.15 15.85 ok

Model Metadata

  • Pipeline class: BitDanceDiffusionPipeline
  • Diffusers version in config: 0.36.0
  • Parallel prediction factor: 64
  • Text stack: Qwen3ForCausalLM + Qwen2TokenizerFast
  • Supported resolutions include 1024x1024, 1280x768, 768x1280, 2048x512, and more (see model_index.json)

Citation

If you use this model, please cite BitDance and Diffusers:

@article{ai2026bitdance,
  title   = {BitDance: Scaling Autoregressive Generative Models with Binary Tokens},
  author  = {Ai, Yuang and Han, Jiaming and Zhuang, Shaobin and Hu, Xuefeng and Yang, Ziyan and Yang, Zhenheng and Huang, Huaibo and Yue, Xiangyu and Chen, Hao},
  journal = {arXiv preprint arXiv:2602.14041},
  year    = {2026}
}

@inproceedings{von-platen-etal-2022-diffusers,
  title     = {Diffusers: State-of-the-art diffusion models},
  author    = {Patrick von Platen and Suraj Patil and Anton Lozhkov and Damar Jablonski and Hernan Bischof and Thomas Wolf},
  booktitle = {GitHub repository},
  year      = {2022},
  url       = {https://github.com/huggingface/diffusers}
}

License

This repository is distributed under the Apache-2.0 license, consistent with the upstream BitDance release.

Downloads last month
221
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for BiliSakura/BitDance-14B-64x-diffusers

Finetuned
Qwen/Qwen3-14B
Finetuned
(1)
this model

Collection including BiliSakura/BitDance-14B-64x-diffusers

Paper for BiliSakura/BitDance-14B-64x-diffusers