reproduce BRIGHT result

by abdoelsayed - opened Mar 23

Discussion

abdoelsayed

Mar 23

Hi, thanks for releasing Reason-ModernColBERT.

I’m trying to reproduce the Biology BRIGHT result reported in the model card (NDCG@10 = 33.25).

I evaluated Biology in two ways:

ANN candidate retrieval + ColBERT reranking
exact full ColBERT scoring against all Biology documents (no ANN pruning), with pytrec_eval-style
metrics

In both cases I get about NDCG@10 ~= 0.295 instead of 0.3325.

Exact run:

task: biology
split: examples
NDCG@10: 0.29534
MRR@10: 0.38032
MAP@100: 0.23691
Recall@100: 0.72108

My environment currently shows:

checkpoint created with sentence-transformers 4.0.2
runtime has sentence-transformers 3.4.1

So I wanted to ask:

What exact evaluation script/repo was used for the published BRIGHT numbers?

NohTow

LightOn AI org Mar 24

•

edited Mar 24

Hello,

The reproduction setup and the exact script used can be found here.

I also recently reproduced the results and merged them into MTEB using the official MTEB implementation, see here for more information.
Some results are a bit different, but the large majority is expected considering the variance of the indexes because back then we did not fix the randomness, and also considering we changed index (StanfordNLP -> FastPlaid) AND that the dataset also changed versions. But the global values are consistent.

Hope it helps!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment