Model Benchmarks
Search for model
Search
RANK
MODEL
PROVIDER
AVERAGE SCORE
SECURITY
SAFETY
RELIABILITY
1
GPT-4.1 Mini 2025-04-14
Azure OpenAI
75.82
77.50
83.19
66.77
2
GPT-4.1 2025-04-14
Azure OpenAI
74.92
77.97
81.37
65.40
3
GPT-4o Mini 2024-07-18
OpenAI
74.52
78.34
79.20
66.02
4
Gemini 2.5 Flash Preview 04-17
73.32
75.37
78.20
66.41
5
GPT-4.1 Nano 2025-04-14
OpenAI
72.96
70.55
83.83
64.52
6
Gemini 2.5 Pro Preview 05-06
72.84
74.50
77.68
66.34
7
R1
Deepseek
52.22
31.67
73.66
51.35
8
Llama 3-1 70b Instruct
Meta
50.72
42.44
73.16
36.57
9
Llama 3.2 3B
Meta
50.61
51.71
64.51
35.61
10
R1 Distill Llama 8b
Deepseek
49.30
55.28
73.92
18.71
11
2-5 VL 1-5b Instruct
Alibaba Qwen
45.06
54.26
72.15
8.78
12
Llama 3.2 1B
Meta
36.56
52.31
43.15
14.20