Model Benchmarks
Search for model
Search
RANK
MODEL
PROVIDER
AVERAGE SCORE
SECURITY
SAFETY
RELIABILITY
1
GPT-4o Mini 2024-07-18
OpenAI
99.06
98.28
99.88
99.03
2
Gemini 2.5 Pro Preview 05-06
98.32
96.13
99.62
99.22
3
GPT-4.1 Mini 2025-04-14
Azure OpenAI
98.31
95.98
99.53
99.42
4
Gemini 2.5 Flash Preview 04-17
96.40
90.54
99.45
99.22
5
GPT-4.1 2025-04-14
Azure OpenAI
95.14
91.69
98.80
94.93
6
GPT-4.1 Nano 2025-04-14
OpenAI
93.51
83.52
99.74
97.27
7
Llama 3.2 3B
Meta
64.32
55.01
82.18
55.75
8
R1 Distill Llama 8b
Deepseek
48.75
53.37
74.86
18.01
9
R1
Deepseek
47.49
13.77
68.43
60.28
10
2-5 VL 1-5b Instruct
Alibaba Qwen
46.99
56.24
74.10
10.62
11
Llama 3.2 1B
Meta
42.89
54.58
54.01
20.08
12
Llama 3-1 70b Instruct
Meta
41.00
26.26
68.81
27.94