openbmb/VisRAG-Ret-Train-In-domain-data
Viewer • Updated • 123k • 1.23k • 7
A compact multi-modal embedding model that creates unified embeddings for text and images, enabling efficient retrieval across modalities without intermediate VLM transformations.
Install package:
pip install sportsvision
Basic usage:
import torch
from sportsvision.research.configs import UnifiedEmbedderConfig
from sportsvision.research.models import UnifiedEmbedderModel
from transformers import AutoConfig, AutoModel
from PIL import Image
# Register the custom configuration and model
AutoConfig.register("unified_embedder", UnifiedEmbedderConfig)
AutoModel.register(UnifiedEmbedderConfig, UnifiedEmbedderModel)
# Initialize the model from the pretrained repository
emb_model = AutoModel.from_pretrained("sportsvision/omniemb-v1")
# Determine the device
device = "cuda" if torch.cuda.is_available() else "cpu"
# Move the model to the device
emb_model = emb_model.to(device)
# Set the model to evaluation mode
emb_model.eval()
# Sample texts
texts = [
"Playoff season is exciting!",
"Injury updates for the team."
]
# Encode texts to obtain embeddings
text_embeddings = emb_model.encode_texts(texts)
print("Text Embeddings:", text_embeddings)
# Sample images
image_paths = [
"path_to_image1.jpg",
"path_to_image2.jpg"
]
# Load images using PIL
images = [Image.open(img_path).convert('RGB') for img_path in image_paths]
# Encode images to obtain embeddings
image_embeddings = emb_model.encode_images(images)
print("Image Embeddings:", image_embeddings)
If you use this model in your research, please cite:
@misc{kodathala2024omniemb,
author = {Kodathala, Varun},
title = {OmniEmb-v1: Multi-Modal Embeddings for Unified Retrieval},
year = {2024},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/sportsvision/omniemb-v1}}
}
Base model
openai/clip-vit-large-patch14