CLIP ViT-B/32 LAION — ONNX INT8

INT8-quantized ONNX export of laion/CLIP-ViT-B-32-laion2B-s34B-b79K
Optimized for CPU-only inference (no GPU required at serving time).

Files

File Description
vision_encoder_int8.onnx Vision (image) encoder — INT8 quantized
text_encoder_int8.onnx Text encoder — INT8 quantized
projections.npy Visual + text projection weights (FP32)
tokenizer_config.json etc. Processor / tokenizer config

Performance vs original

FP32 original ONNX INT8
Disk ~600 MB ~150 MB
RAM ~1.8 GB ~500 MB
Image embed (CPU) ~800 ms ~200 ms
Text embed (CPU) ~300 ms ~80 ms

Quick start

from huggingface_hub import snapshot_download
import onnxruntime as ort
import numpy as np
from transformers import CLIPProcessor
from PIL import Image

model_dir = snapshot_download("rdxtremity/clip-laion-b32-onnx-int8")
processor = CLIPProcessor.from_pretrained(model_dir)
proj = np.load(f"{model_dir}/projections.npy", allow_pickle=True).item()

vision_sess = ort.InferenceSession(f"{model_dir}/vision_encoder_int8.onnx")
text_sess   = ort.InferenceSession(f"{model_dir}/text_encoder_int8.onnx")

# Embed an image
img   = Image.open("product.jpg").convert("RGB")
inp   = processor(images=img, return_tensors="np")
out   = vision_sess.run(["pooler_output"], {"pixel_values": inp["pixel_values"].astype(np.float32)})
vec   = out[0] @ proj["visual_projection"].T
vec  /= np.linalg.norm(vec)

# Embed text (Arabic + English supported)
inp   = processor(text="حذاء رياضي أحمر", return_tensors="np", padding="max_length", truncation=True, max_length=77)
out   = text_sess.run(["pooler_output"], {"input_ids": inp["input_ids"].astype(np.int64), "attention_mask": inp["attention_mask"].astype(np.int64)})
vec   = out[0] @ proj["text_projection"].T
vec  /= np.linalg.norm(vec)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rdxtremity/clip-laion-b32-onnx-int8

Quantized
(1)
this model