SentenceTransformer based on intfloat/multilingual-e5-large

This is a sentence-transformers model finetuned from intfloat/multilingual-e5-large. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: intfloat/multilingual-e5-large
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'XLMRobertaModel'})
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Barakuga/me5-checkthat-task1")
# Run inference
sentences = [
    'query: @user It’s quite possibly the reverse (Pfizer @ 39%) “Our data show that anti-disease vaccines that do not prevent transmission can create conditions that promote the emergence of pathogen strains that cause more severe disease in unvaccinated hosts” Source:',
    "passage: title: Imperfect Vaccination Can Enhance the Transmission of Highly Virulent Pathogens abstract: Could some vaccines drive the evolution of more virulent pathogens? Conventional wisdom is that natural selection will remove highly lethal pathogens if host death greatly reduces transmission. Vaccines that keep hosts alive but still allow transmission could thus allow very virulent strains to circulate in a population. Here we show experimentally that immunization of chickens against Marek's disease virus enhances the fitness of more virulent strains, making it possible for hyperpathogenic strains to transmit. Immunity elicited by direct vaccination or by maternal vaccination prolongs host survival but does not prevent infection, viral replication or transmission, thus extending the infectious periods of strains otherwise too lethal to persist. Our data show that anti-disease vaccines that do not prevent transmission can create conditions that promote the emergence of pathogen strains that cause more severe disease in unvaccinated hosts.",
    'passage: title: Access to lifesaving medical resources for African countries: COVID-19 testing and response, ethics, and politics abstract: Coronavirus disease 2019 (COVID-19) has revealed how strikingly unprepared the world is for a pandemic and how easily viruses spread in our interconnected world. A governance crisis is unfolding alongside the pandemic as health officials around the world compete for access to scarce medical supplies. As governments of African countries, and those in low-income and middle-income countries around the world, seek to avoid potentially catastrophic epidemics and learn from what has worked in other countries, testing and other medical resources are of concern.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000,  0.7027, -0.0760],
#         [ 0.7027,  1.0000, -0.0821],
#         [-0.0760, -0.0821,  1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 19,244 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 21 tokens
    • mean: 59.89 tokens
    • max: 134 tokens
    • min: 30 tokens
    • mean: 336.42 tokens
    • max: 512 tokens
  • Samples:
    sentence_0 sentence_1
    query: In what way will Language Modelers such as ChatGPT Impact Jobs and Sectors? by Edward W. Felten, Manav Raj, Robert Seamans :: SSRN passage: title: How will Language Modelers like ChatGPT Affect Occupations and Industries? abstract: Recent dramatic increases in AI language modeling capabilities has led to many questions about the effect of these technologies on the economy. In this paper we present a methodology to systematically assess the extent to which occupations, industries and geographies are exposed to advances in AI language modeling capabilities. We find that the top occupations exposed to language modeling include telemarketers and a variety of post-secondary teachers such as English language and literature, foreign language and literature, and history teachers. We find the top industries exposed to advances in language modeling are legal services and securities, commodities, and investments. We also find a positive correlation between wages and exposure to AI language modeling.
    query: Spannende Studie zu #POTS. Sie verdeutlicht, was man ärztlich häufig wahrnimmt, nämlich dass die geistige Leistungsfähigkeit beim Sitzen und Stehen nachlässt. In diesem Fall waren Konzentration und Ausführungsfunktion gegen Kontrollen vermindert. 1/6 passage: title: Cognitive functioning in postural orthostatic tachycardia syndrome among different body positions: a prospective pilot study (POTSKog study) abstract: Approximately 96% of patients with postural orthostatic tachycardia syndrome (PoTS) report cognitive complaints. We investigated whether cognitive function is impaired during sitting and active standing in 30 patients with PoTS compared with 30 healthy controls (HCs) and whether it will improve with the counter manoeuvre of leg crossing.In this prospective pilot study, patients with PoTS were compared to HCs matched for age, sex, and educational level. Baseline data included norepinephrine plasma levels, autonomic testing and baseline cognitive function in a seated position [the Montreal Cognitive Assessment, the Leistungsprüfsystem (LPS) subtests 1 and 2, and the Test of Attentional Performance (TAP)]. Cognitive functioning was examined in a randomized order in supine, upright and upright legs crossed position. The prima...
    query: We now know that Omicron is far from mild. In the unvaccinated it is equally lethal, while being more contagious, as other strains. Most children were and remain not vaccinated. We were aware that in winter 2021. And we know it now. passage: title: Intrinsic and effective severity of COVID-19 cases infected with the ancestral strain and Omicron BA.2 variant in Hong Kong abstract: ABSTRACT Background Understanding severity of infections with SARS-CoV-2 and its variants is crucial to inform public health measures. Here we used COVID-19 patient data from Hong Kong to characterise the severity profile of COVID-19 and to examine factors associated with fatality of infection. Methods Time-varying and age-specific effective severity measured by case-hospitalization risk and hospitalization risk was estimated with all individual COVID-19 case data collected in Hong Kong from 23 January 2020 through to 26 October 2022 over six epidemic waves, in comparison with estimates of influenza A(H1N1)pdm09 during the 2009 pandemic. The intrinsic severity of Omicron BA.2 was compared with the estimate for the ancestral strain with the data from unvaccinated patients without previous infections. Factors potentially associated with the...
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false,
        "directions": [
            "query_to_doc"
        ],
        "partition_mode": "joint",
        "hardness_mode": null,
        "hardness_strength": 0.0
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 4
  • num_train_epochs: 1
  • fp16: True
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 4
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_ratio: None
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • enable_jit_checkpoint: False
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • use_cpu: False
  • seed: 42
  • data_seed: None
  • bf16: False
  • fp16: True
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: -1
  • ddp_backend: None
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • auto_find_batch_size: False
  • full_determinism: False
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • use_cache: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss
0.1039 500 0.1880
0.2079 1000 0.1486
0.3118 1500 0.1368
0.4157 2000 0.1392
0.5196 2500 0.1169
0.6236 3000 0.1305
0.7275 3500 0.1070
0.8314 4000 0.1079
0.9354 4500 0.1064

Framework Versions

  • Python: 3.12.13
  • Sentence Transformers: 5.3.0
  • Transformers: 5.0.0
  • PyTorch: 2.10.0+cu128
  • Accelerate: 1.13.0
  • Datasets: 4.0.0
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{oord2019representationlearningcontrastivepredictive,
      title={Representation Learning with Contrastive Predictive Coding},
      author={Aaron van den Oord and Yazhe Li and Oriol Vinyals},
      year={2019},
      eprint={1807.03748},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/1807.03748},
}
Downloads last month
3
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Barakuga/me5-checkthat-task1

Finetuned
(171)
this model

Papers for Barakuga/me5-checkthat-task1