Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
omarkamali 
posted an update 10 days ago
Post
850
Just sharing a little breakthrough with Gherbal LID where we managed to distinguish the 15 variants of Arabic with 6 variants above 90%, 10 variants above 85% accuracy, practically distinguishing Moroccan and Algerian (which overlap massively).

It also embraces the duality of MSA and arabic variants pioneered in ALDi by @AMR-KELEG et al.

Now we're only bottlenecked by the availability of high quality data for the low scoring variants such as Iraqi, Libyan, Sudanese, Adeni ...

More on Gherbal at:
https://omneitylabs.com/models/gherbal

About ALDi (Arabic Level of Dialectness):

The latest Gherbal follows a similar convention to ALDi (MSA and Variants are scored separately) but does not reuse any of the code or data.

For more information about the original ALDi: https://github.com/AMR-KELEG/ALDi

In this post