Feature Extraction
Transformers
Safetensors
sentence-transformers
minicpm
mteb
custom_code
Eval Results (legacy)
Instructions to use openbmb/MiniCPM-Embedding-Light with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use openbmb/MiniCPM-Embedding-Light with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="openbmb/MiniCPM-Embedding-Light", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("openbmb/MiniCPM-Embedding-Light", trust_remote_code=True, dtype="auto") - sentence-transformers
How to use openbmb/MiniCPM-Embedding-Light with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("openbmb/MiniCPM-Embedding-Light", trust_remote_code=True) sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
File size: 1,604 Bytes
75f07f8 321b9bb 75f07f8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
from transformers import AutoModel
import torch
model_name = "openbmb/MiniCPM-Embedding-Light"
model = AutoModel.from_pretrained(model_name, trust_remote_code=True, torch_dtype=torch.float16).to("cuda")
# you can use flash_attention_2 for faster inference
# model = AutoModel.from_pretrained(model_name, trust_remote_code=True, attn_implementation="flash_attention_2", torch_dtype=torch.float16).to("cuda")
model.eval()
queries = ["MiniCPM-o 2.6 A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone"]
passages = ["MiniCPM-o 2.6 is the latest and most capable model in the MiniCPM-o series. The model is built in an end-to-end fashion based on SigLip-400M, Whisper-medium-300M, ChatTTS-200M, and Qwen2.5-7B with a total of 8B parameters. It exhibits a significant performance improvement over MiniCPM-V 2.6, and introduces new features for real-time speech conversation and multimodal live streaming."]
embeddings_query_dense, embeddings_query_sparse = model.encode_query(queries, return_sparse_vectors=True, max_length=8192, dense_dim=1024)
embeddings_doc_dense, embeddings_doc_sparse = model.encode_corpus(passages, return_sparse_vectors=True)
dense_scores = (embeddings_query_dense @ embeddings_doc_dense.T)
print(dense_scores.tolist()) # [[0.6512398719787598]]
print(model.compute_sparse_score_dicts(embeddings_query_sparse, embeddings_doc_sparse)) # [[0.27202296]]
dense_scores, sparse_scores, mixed_scores = model.compute_score(queries, passages)
print(dense_scores) # [[0.65123993]]
print(sparse_scores) # [[0.27202296]]
print(mixed_scores) # [[0.73284686]] |