e3babb8015398f22a63fe6901713d68e

This model is a fine-tuned version of google/umt5-small on the Helsinki-NLP/opus_books [fr-sv] dataset. It achieves the following results on the evaluation set:

  • Loss: 3.4856
  • Data Size: 1.0
  • Epoch Runtime: 13.2890
  • Bleu: 2.4818

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 18.1002 0 1.7283 0.0562
No log 1 75 18.1645 0.0078 1.9587 0.0345
No log 2 150 17.2923 0.0156 2.3066 0.0580
No log 3 225 16.4469 0.0312 2.9083 0.0569
No log 4 300 15.5651 0.0625 3.3489 0.0556
No log 5 375 14.3630 0.125 4.2416 0.0595
No log 6 450 12.1033 0.25 5.4460 0.0447
No log 7 525 9.0282 0.5 8.2194 0.0658
8.9277 8.0 600 6.5578 1.0 13.9101 0.1856
7.7238 9.0 675 5.2655 1.0 14.2897 0.4651
6.5062 10.0 750 4.9267 1.0 13.0144 0.5096
6.2022 11.0 825 4.7138 1.0 13.0152 0.7079
5.8041 12.0 900 4.5339 1.0 12.9225 1.2388
5.5943 13.0 975 4.4047 1.0 12.8617 1.5846
5.3682 14.0 1050 4.2823 1.0 12.8135 1.9420
5.2353 15.0 1125 4.1989 1.0 12.8609 1.0069
5.0499 16.0 1200 4.0935 1.0 13.3976 1.1048
4.9827 17.0 1275 4.0182 1.0 12.6508 1.2535
4.8603 18.0 1350 3.9544 1.0 12.9864 1.3214
4.787 19.0 1425 3.9019 1.0 13.4680 1.4019
4.6777 20.0 1500 3.8460 1.0 13.8981 1.4917
4.6164 21.0 1575 3.8066 1.0 14.8219 1.5288
4.5516 22.0 1650 3.7706 1.0 14.7882 1.6005
4.4776 23.0 1725 3.7467 1.0 14.7274 1.6067
4.4162 24.0 1800 3.7196 1.0 14.5843 1.7865
4.4035 25.0 1875 3.6955 1.0 12.8680 1.9358
4.3159 26.0 1950 3.6742 1.0 12.9960 1.8525
4.3001 27.0 2025 3.6600 1.0 13.4016 1.9151
4.2257 28.0 2100 3.6387 1.0 14.1796 2.0088
4.1948 29.0 2175 3.6233 1.0 15.2012 2.0336
4.1688 30.0 2250 3.6159 1.0 14.7225 2.0623
4.138 31.0 2325 3.6047 1.0 14.7262 2.1031
4.1102 32.0 2400 3.5928 1.0 15.0099 2.1592
4.0425 33.0 2475 3.5763 1.0 13.2776 2.1841
4.0315 34.0 2550 3.5702 1.0 13.7277 2.2052
3.9818 35.0 2625 3.5615 1.0 14.4340 2.2627
3.9634 36.0 2700 3.5535 1.0 14.3716 2.2744
3.9299 37.0 2775 3.5521 1.0 14.8856 2.2306
3.9164 38.0 2850 3.5422 1.0 14.8994 2.3079
3.8966 39.0 2925 3.5384 1.0 14.8891 2.3486
3.8606 40.0 3000 3.5302 1.0 15.0663 2.3331
3.822 41.0 3075 3.5221 1.0 15.6057 2.4048
3.7902 42.0 3150 3.5173 1.0 13.2500 2.3580
3.7706 43.0 3225 3.5143 1.0 13.1484 2.4666
3.7576 44.0 3300 3.5095 1.0 13.2237 2.3792
3.7302 45.0 3375 3.5057 1.0 13.7542 2.4260
3.6948 46.0 3450 3.4992 1.0 13.5008 2.4733
3.6967 47.0 3525 3.4913 1.0 13.9558 2.4751
3.6627 48.0 3600 3.4882 1.0 14.6121 2.5331
3.6272 49.0 3675 3.4887 1.0 16.1375 2.5500
3.6257 50.0 3750 3.4856 1.0 13.2890 2.4818

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
-
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/e3babb8015398f22a63fe6901713d68e

Base model

google/umt5-small
Finetuned
(45)
this model