RL finetuning on this merge leads to model collapse
#11 opened 11 months ago
by
radna
非常喜欢这个模型
#9 opened 12 months ago
by
SongXiaoMao
Add comparison with 70B distilled R1 model
#8 opened 12 months ago
by
blankohagen7
Update model card
#7 opened about 1 year ago
by
minpeter
Temperature's effect on the performance of long chain reasoning models. Why was 0.7 used for the evals?
👍 1
1
#6 opened about 1 year ago
by
j456
License of your model
🤝 1
1
#4 opened about 1 year ago
by
chewkokwah
Evaluation
🤝 1
1
#3 opened about 1 year ago
by
PSM24
Merge with 32b coder?
👀 1
14
#2 opened about 1 year ago
by
RDson