Instructions to use google/diffusiongemma-26B-A4B-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use google/diffusiongemma-26B-A4B-it with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="google/diffusiongemma-26B-A4B-it")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("google/diffusiongemma-26B-A4B-it")
model = AutoModelForMultimodalLM.from_pretrained("google/diffusiongemma-26B-A4B-it")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use google/diffusiongemma-26B-A4B-it with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "google/diffusiongemma-26B-A4B-it"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/diffusiongemma-26B-A4B-it",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/google/diffusiongemma-26B-A4B-it

SGLang

How to use google/diffusiongemma-26B-A4B-it with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "google/diffusiongemma-26B-A4B-it" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/diffusiongemma-26B-A4B-it",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "google/diffusiongemma-26B-A4B-it" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/diffusiongemma-26B-A4B-it",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use google/diffusiongemma-26B-A4B-it with Docker Model Runner:
```
docker model run hf.co/google/diffusiongemma-26B-A4B-it
```

Add files for diffusers loading

#20

by dg845 - opened 7 days ago

base: refs/heads/main

←

from: refs/pr/20

Discussion Files changed

+25

-0

dg845

7 days ago

Pull Request opened with the huggingface_hub Python library

Add diffusers scheduler for diffusers loading46713173

dg845 changed pull request title from Add diffusers scheduler for diffusers loading to Add files for diffusers loading 7 days ago

Add diffusers model index for diffusers loading93998888

dg845

7 days ago

This PR adds a diffusers scheduler/ subdirectory and model_index.json file to support diffusers loading. After the PR, the following should work:

import torch
from diffusers import DiffusionGemmaPipeline

model_id = "google/diffusiongemma-26B-A4B-it"
pipe = DiffusionGemmaPipeline.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    revision="refs/pr/20",   # Omit if merged to main
)
pipe.to("cuda")

output = pipe(
    prompt="Why is the sky blue?",
    gen_length=256,
    num_inference_steps=48,
    cache_implementation="static",
    generator=torch.Generator("cuda").manual_seed(42),
)
print(output.texts[0])

dg845

6 days ago

I think there may be a bug where using only DiffusionGemmaPipeline.from_pretrained doesn't correctly get all of the modeling files. For now, this can be mitigated by first using snapshot_download to get all of the files:

import torch
from huggingface_hub import snapshot_download
from diffusers import DiffusionGemmaPipeline

model_id = "google/diffusiongemma-26B-A4B-it"
snapshot_download(repo_id=model_id, revision="refs/pr/20")

pipe = DiffusionGemmaPipeline.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    revision="refs/pr/20",   # Omit if merged to main
)
pipe.to("cuda")

output = pipe(
    prompt="Why is the sky blue?",
    gen_length=256,
    num_inference_steps=48,
    cache_implementation="static",
    generator=torch.Generator("cuda").manual_seed(42),
)
print(output.texts[0])

dg845

6 days ago

Closing the PR for now, we will re-open once we have addressed the issue described in https://huggingface.co/google/diffusiongemma-26B-A4B-it/discussions/20#6a43020c48cb38f28686f686.

dg845 changed pull request status to closed 6 days ago

dg845 changed pull request status to open 3 days ago

dg845

3 days ago

Reopening the PR. After this diffusers PR, the issue should be addressed and from_pretrained should work as expected without first needing a snapshot_download.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment