Model Benchmarks
Search for model
Search
RANK
MODEL
PROVIDER
AVERAGE SCORE
SECURITY
SAFETY
RELIABILITY
1
GPT-4o Mini 2024-07-18
OpenAI
99.32
99.43
99.50
99.03
2
Gemini 2.5 Pro Preview 05-06
98.95
98.14
98.92
99.81
3
Gemini 2.5 Flash Preview 04-17
98.71
96.70
99.42
100
4
GPT-4.1 2025-04-14
Azure OpenAI
97.78
96.85
98.25
98.25
5
GPT-4.1 Mini 2025-04-14
Azure OpenAI
97.70
96.13
99.12
97.86
6
GPT-4.1 Nano 2025-04-14
OpenAI
96.79
94.27
99.80
96.30
7
R1
Deepseek
85.34
68.87
93.38
93.76
8
Llama 3-1 70b Instruct
Meta
77.63
62.84
88.28
81.76
9
R1 Distill Llama 8b
Deepseek
64.15
72.31
82.04
38.11
10
Llama 3.2 3B
Meta
62.60
62.03
74.69
51.07
11
2-5 VL 1-5b Instruct
Alibaba Qwen
52.06
66.01
74.48
15.70
12
Llama 3.2 1B
Meta
40.82
56.59
46.37
19.49