Model Benchmarks
Search for model
Search
RANK
MODEL
PROVIDER
AVERAGE SCORE
SECURITY
SAFETY
RELIABILITY
1
2-5 VL 1-5b Instruct
Alibaba Qwen
54.20
40.54
67.86
0.00
2
R1 Distill Llama 8b
Deepseek
52.51
40.15
64.87
0.00
3
Llama 3-1 70b Instruct
Meta
50.30
38.22
62.38
0.00
4
GPT-4.1 Nano 2025-04-14
OpenAI
42.90
33.85
51.94
0.00
5
GPT-4o Mini 2024-07-18
OpenAI
37.76
37.31
38.20
0.00
6
Llama 3.2 3B
Meta
37.37
38.08
36.66
0.00
7
Gemini 2.5 Flash Preview 04-17
37.29
38.85
35.73
0.00
8
R1
Deepseek
35.76
12.36
59.17
0.00
9
Gemini 2.5 Pro Preview 05-06
31.86
29.23
34.49
0.00
10
GPT-4.1 2025-04-14
Azure OpenAI
31.83
45.38
47.08
3.03
11
GPT-4.1 Mini 2025-04-14
Azure OpenAI
31.44
40.38
50.90
3.03
12
Llama 3.2 1B
Meta
25.96
45.77
29.08
3.03