WorldCuisines Leaderboard

Which Visual Language Model (VLM) is the BEST on understanding culture through food?
🏆 Welcome to the WorldCuisines leaderboard! The leaderboard evaluates VLM's multilinguality and multicultural understanding based on dishes around the world.

We provide small test set and large test set. Both test sets contain the following tasks:

Dish Name (No Context): predict the name of a dish based on its image and question without any context.
Dish Name (Contextualized): predict the name of a dish based on its image and question with additional context information.
Dish Name (Adversarial): predict the name of a dish based on its image and the question with adversarial context.
Location: predict the location where the food is commonly consumed and originated given the dish image, question, and a context.

Each test set has two settings:

MCQ: multiple choice questions
OEQ: open-ended questions

How to evaluate your model and submit your results?
Please refer to the guideline in Github README to evaluate your own model (soon to be released).

ℹ️ The model utilizes an optimized prompt (Check our repository for details) instead of the original one.

Model	Avg	Dish Name (No Context)	Dish Name (Contextualized)	Dish Name (Adversarial)	Location
Llama 3.2 Instruct 90B	81.17	78.17	90.43	82.23	56.73

Model	Avg	Dish Name (No Context)	Dish Name (Contextualized)	Dish Name (Adversarial)	Location
GPT-4o	81.17	88.4	90.43	82.23	63.6
Gemini 1.5 Flash	74.39	78.17	82.07	71.33	66
Llama 3.2 Instruct 90B	74.17	77.33	83.43	71.23	64.7
Qwen2 VL Instruct 72B	70.43	76.13	81.63	67.23	56.73
GPT-4o Mini	69.01	75.33	83	64.83	52.87
NVLM-D 72B	65.62	75.5	78.2	54.67	54.13
Qwen2 VL Instruct 7B	61.21	63.83	67.2	57	56.8
Aria 25B	58.48	65.77	71.43	57.13	39.6
Pixtral 12B	57.51	57.57	72.33	55.4	44.73
Llama 3.2 Instruct 11B	56.59	57.93	65.57	56.27	46.6
Pangea 7B ℹ️	56.03	54.87	65.77	55	48.47
Molmo-D 7B	48.27	50.67	57	48.67	36.73
Qwen2 VL Instruct 2B	45.2	40.97	44.4	47.07	48.37
Phi-3.5 Vision 4B	44.11	49.27	53.03	42.9	31.23
Llava1.6 Vicuna 13B	40.15	40.87	50.3	38.37	31.07
Molmo-O 7B	39.43	46.03	43.27	41.6	26.83
Llava1.6 Vicuna 7B	33.3	33.63	43.13	28.67	27.77
Molmo-E 1B	21.56	21.87	24.53	20.23	19.6

Model	Avg	Dish Name (No Context)	Dish Name (Contextualized)	Dish Name (Adversarial)	Location
GPT-4o	82.21	88.45	91.57	82.29	66.52
Gemini 1.5 Flash	74.67	77.05	80.97	69.13	71.53
Llama 3.2 Instruct 90B	73.11	77.69	82.92	63.96	67.87
Qwen2 VL Instruct 72B	69.83	74.19	80.79	62.43	61.9
GPT-4o Mini	66.14	72.8	81.65	57.76	52.37
NVLM-D 72B	63.21	69.82	78.93	52.12	51.97
Qwen2 VL Instruct 7B	59.69	61.48	67.85	53.52	55.9
Llama 3.2 Instruct 11B	58.79	59.93	64.12	53.17	57.93
Pixtral 12B	56.53	56.65	70.69	52.12	46.67
Aria 25B	55.88	58.61	69.29	52.82	42.82
Pangea 7B ℹ️	53.32	52.35	63.07	49.17	48.71
Molmo-D 7B	44.23	46.01	55.95	41.61	33.35
Qwen2 VL Instruct 2B	42.87	41.65	42.29	39.69	47.85
Phi-3.5 Vision 4B	41.99	43.37	48.71	40.87	35.01
Llava1.6 Vicuna 13B	41.3	40.17	48.17	39.05	37.79
Molmo-O 7B	38.28	39.96	44.93	38.41	29.81
Llava1.6 Vicuna 7B	36.28	34.57	43.48	34.84	32.24
Molmo-E 1B	20.39	18.81	24.22	19.55	18.97

Model	Avg	Dish Name (No Context)	Dish Name (Contextualized)	Dish Name (Adversarial)	Location
GPT-4o	25.05	16.6	35.47	12.6	35.53
Gemini 1.5 Flash	19.86	16.3	23.53	7.33	32.3
Llama 3.2 Instruct 90B	18.82	14.27	22.3	9	29.73
Llama 3.2 Instruct 11B	17.58	14.37	19.2	9.5	27.23
Qwen2 VL Instruct 72B	15.04	10.4	17.43	6.27	26.07
GPT-4o Mini	13.85	7.3	17.67	3.53	26.9
Qwen2 VL Instruct 7B	9.44	4.07	8.57	3.9	21.23
NVLM-D 72B	7.32	3.13	7.37	1.37	17.4
Aria 25B	6.66	2.67	6.47	1.8	15.7
Qwen2 VL Instruct 2B	5.96	3.33	4.6	3.43	12.5
Pangea 7B ℹ️	5.07	0.43	1.33	0.63	17.9
Molmo-O 7B	4.41	2.13	4.37	2.1	9.03
Molmo-D 7B	4.16	1	2.23	1.73	11.7
Pixtral 12B	3.96	0.6	1.83	0.57	12.83
Llava1.6 Vicuna 13B	3.85	1	4.17	1.6	8.63
Phi-3.5 Vision 4B	3.67	1.9	3.03	1.33	8.43
Llava1.6 Vicuna 7B	3.06	0.87	2.83	0.6	7.93
Molmo-E 1B	0.35	0	0.13	0	1.27

Model	Avg	Dish Name (No Context)	Dish Name (Contextualized)	Dish Name (Adversarial)	Location
GPT-4o	27.83	21.88	37.51	14.79	37.13
Llama 3.2 Instruct 11B	21.67	18.75	22.96	13.39	31.58
Llama 3.2 Instruct 90B	20.68	16.93	23.6	10.87	31.31
Qwen2 VL Instruct 72B	17.4	12.67	21.31	8.37	27.27
Gemini 1.5 Flash	16.12	12.81	15.16	6.46	30.03
GPT-4o Mini	15.66	10.28	20.87	5.72	25.79
Qwen2 VL Instruct 7B	11.07	6.76	10.36	6.12	21.03
Qwen2 VL Instruct 2B	9.35	7.98	8.13	6.74	14.55
NVLM-D 72B	8.64	4.71	10.29	2.89	16.68
Aria 25B	8.44	4.99	9.17	3.39	16.2
Pangea 7B ℹ️	6.49	1.52	2.73	1.57	20.15
Molmo-O 7B	6.19	5.15	6.03	3.51	10.07
Llava1.6 Vicuna 13B	5.34	2.79	5.85	2.57	10.16
Molmo-D 7B	5.08	2.89	3.66	2.31	11.45
Pixtral 12B	4.92	1.22	2.94	1.09	14.43
Phi-3.5 Vision 4B	4.61	2.91	4.23	2.07	9.22
Llava1.6 Vicuna 7B	4.08	1.59	4.03	1.41	9.29
Molmo-E 1B	0.45	0.01	0.23	0.01	1.54

WorldCuisines Leaderboard

Abstract

Resources

WorldCuisines Leaderboard Submission Instructions

⚠ Please note that you need to submit the json file with following format:

✉️✨ Submit your json file here!