Good model, but with limitations - Good at English credentials but has worse performance in non-English ones

#6
by dof-studio-org - opened

PAY ATTENTION GUYS:

Good at English credentials but has worse performance in non-English ones

For example, if one address has some non-alphabet chars, detection may fail!

e.g. Baby、今夜大和ホテル鳳舞館に行くよ。そこの304で待っててね

As far as I am concerned, the exact room number should be masked or I may have options to choose to mask it.

But IN ENGLISH

Babe I'm gonna be at the Yamato Hotel Hobukan Room tonight. Room 304—wait for me there!

THE ADDRESS IS MAKRED. You can test this case.

Just a warning for those guys who want to use in multilingual places. :)

Not just worse, it seems to ignore other languages altogether.

@dof-studio-org , you are hereby greeted,

The model card states this (as of commit 7ffa9a043d54d1be65afb281eddf0ffbe629385b on file README.md):

  • Line 148: - Language(s): Primarily English; selected multilingual robustness evaluation reported
  • Line 167: Performance may drop on non-English text, non-Latin scripts, protected-group naming patterns, or domains that are out of distribution compared to model training.

Your example (Baby、今夜大和ホテル鳳舞館に行くよ。そこの304で待っててね) will therefore fail because:

  • L148 reports that the tensors were primarily trained on English, indicating that other languages are not guaranteed to work (including Japanese)
  • L167 reports that performance "may" (it's better to interpret this as "will") drop on non-English and on non-Latin languages, explicitly mentioning "poor" (or no) support for languages like Japanese, Arabic and Thai.

You are hereby bidden farewell.

e.g. Baby、今夜大和ホテル鳳舞館に行くよ。そこの304で待っててね

Nvidia's model is more powerful and even multilingual, and it was originally trained as a classifier. You can try it here.

By the way, your example works on it—it handles it just fine.

We released the model with some support for other languages, but with the idea that the model can be fine-tuned to support them (as well as other labels), much faster.

We can improve in the next version.

By the way, your example works on it—it handles it just fine.

image

I wouldn't call it "just fine". But we should not get into these types of comparisons. Running a benchmark against real datasets is better than just one example

image

I wouldn't call it "just fine". But we should not get into these types of comparisons. Running a benchmark against real datasets is better than just one example

Regarding fine-tuning, I already mentioned it in another discussion. Thank you for paying attention to this example and for running it on Nvidia's model yourself. I tried a few examples on Nvidia and on yours — there's no clear leader in terms of quality, both have issues. It's just that Nvidia initially has more classes (even risk levels) and multilingual support, as well as a public dataset on this topic (I don't know if they actually trained on it, so I won't make that claim).

We are not in competition. We will benchmark against GLiner and can work towards improving.

We have selected fewer classes as we also wanted to have a smaller model for this release. Since people have the ability to finetune for other classes, we didn't feel that was a big issue.

mihaimaruseac changed discussion status to closed

For transparency: A third-party benchmark against GLiNER dropped.

There are places where this model works better, there are places where we need to work more.

For transparency: A third-party benchmark against GLiNER dropped.

There are places where this model works better, there are places where we need to work more.

Write in the discussion closed by you 👍

I saw this benchmark a few days ago on Reddit.
https://www.reddit.com/r/LocalLLaMA/comments/1t0sov1/openais_privacy_filter_vs_gliner_on_600_pii/

The person has GPT image generation on GitHub, so perhaps I drew his attention to the Nvidia model, and he in turn made a benchmark and published it on Reddit.

In any case, the community has become more aware of PII models, despite the bile in the discussions.

Sign up or log in to comment