Skip to content

Updated 1 minute agoSources:LiveBench GlobalLiveBench Reasoning

/ Live Benchmarks / Reasoning

Reasoning benchmarks

Logic, deduction, and inference tasks from LiveBench.

#ModelScore
1
GPT-5.5 Thinking xHigh EffortOpenAI
80.7%
2
GPT-5.4 Thinking xHigh EffortOpenAI
80.3%
3
Gemini 3.1 Pro Preview HighGoogle
79.9%
4
Claude 4.7 Opus Thinking xHigh EffortAnthropic
76.9%
5
Claude 4.6 Opus Thinking High EffortAnthropic
76.3%
6
Claude 4.5 Opus Thinking High EffortAnthropic
76.0%
7
Claude 4.6 Sonnet Thinking Medium EffortAnthropic
75.5%
8
Gemini 3.5 Flash HighGoogle
75.0%
9
GPT-5.2 HighOpenAI
74.8%
10
GPT-5.2 CodexOpenAI
74.3%
11
Qwen 3.7 MaxAlibaba
74.3%
12
GPT-5.1 Codex Max HighOpenAI
74.0%
13
DeepSeek V4 ProDeepSeek
73.6%
14
Gemini 3 Pro Preview HighGoogle
73.4%
15
GPT-5.3 Codex HighOpenAI
72.8%
16
Gemini 3 Flash Preview HighGoogle
72.4%
17
Kimi K2.6 ThinkingMoonshot AI
72.2%
18
GPT-5.1 HighOpenAI
72.0%
19
Qwen 3.6 PlusAlibaba
70.8%
20
GPT-5 ProOpenAI
70.5%
21
GLM 5.1Z.AI
70.2%
22
GPT-5.4 Nano xHighOpenAI
70.1%
23
Kimi K2.5 ThinkingMoonshot AI
69.1%
24
GLM 5Z.AI
68.8%
25
GPT-5.1 CodexOpenAI
68.6%
26
Claude Sonnet 4.5 ThinkingAnthropic
68.2%
27
Grok 4.20 BetaxAI
68.0%
28
GPT-5.4 Mini xHighOpenAI
67.5%
29
DeepSeek V4 FlashDeepSeek
67.3%
30
Grok 4.3xAI
66.7%
31
GPT-5 Mini HighOpenAI
65.9%
32
Qwen 3.6 27BAlibaba
65.6%
33
Minimax M2.7Minimax
63.5%
34
DeepSeek V3.2 ThinkingDeepSeek
62.2%
35
Grok 4xAI
62.0%
36
Claude 4.1 Opus ThinkingAnthropic
61.8%
37
Gemini 3.1 Flash Lite Preview HighGoogle
61.7%
38
Gemma 4 31BGoogle
61.6%
39
Kimi K2 ThinkingMoonshot AI
61.6%
40
Claude Haiku 4.5 ThinkingAnthropic
61.3%
41
Claude 4 Sonnet ThinkingAnthropic
61.3%
42
GPT-5.1 Codex MiniOpenAI
60.4%
43
Qwen 3.6 FlashAlibaba
60.4%
44
Minimax M2.5Minimax
60.1%
45
GPT-5.3 InstantOpenAI
60.0%
46
Grok 4.1 FastxAI
60.0%
47
Claude 4.5 Opus Medium EffortAnthropic
59.1%
48
DeepSeek V3.2 Exp ThinkingDeepSeek
58.9%
49
Gemini 2.5 Pro (Max Thinking)Google
58.3%
50
MiMo V2 ProXiaomi
58.1%
51
GLM 4.7Z.AI
58.1%
52
GLM 4.6Z.AI
55.2%
53
Claude 4.1 OpusAnthropic
54.5%
54
Claude Sonnet 4.5Anthropic
53.7%
55
Gemini 2.5 Flash (Max Thinking) (2025-09-25)Google
53.1%
56
Qwen 3 235B A22B Thinking 2507Alibaba
53.0%
57
DeepSeek V3.2DeepSeek
51.8%
58
Claude 4 SonnetAnthropic
51.0%
59
Qwen 3 Next 80B A3B ThinkingAlibaba
50.4%
60
DeepSeek V3.2 ExpDeepSeek
49.9%
61
GLM 5V TurboZ.AI
49.6%
62
GPT-5.2 No ThinkingOpenAI
48.9%
63
Qwen 3 235B A22B Instruct 2507Alibaba
48.8%
64
GPT-5 Nano HighOpenAI
48.6%
65
Qwen 3 Next 80B A3B InstructAlibaba
48.4%
66
Kimi K2 InstructMoonshot AI
48.1%
67
Gemini 2.5 Flash (Max Thinking) (2025-06-05)Google
47.7%
68
GPT OSS 120bOpenAI
46.1%
69
Claude Haiku 4.5Anthropic
45.3%
70
Grok Code FastxAI
45.1%
71
Qwen 3 32BAlibaba
43.6%
72
GPT-5.1 No ThinkingOpenAI
42.6%
73
Gemini 2.5 Flash Lite (Max Thinking) (2025-06-17)Google
42.6%
74
Gemini 2.5 Flash Lite (Max Thinking) (2025-09-25)Google
42.4%
75
Devstral 2Mistral
41.2%
76
GLM 4.6VZ.AI
40.1%
77
Grok 4.20 Beta (Non-Reasoning)xAI
39.7%
78
Qwen 3 30B A3BAlibaba
39.0%
79
Elephant AlphaOpenRouter
36.0%
80
Grok 4.1 Fast (Non-Reasoning)xAI
33.5%
81
Trinity Large PreviewArcee
32.7%
82
Nemotron 3 Super 120B A12BNVIDIA
32.5%

/ Live Benchmarks

Need help choosing the right AI model for your business?

Benchmarks are a starting point, not an answer. The right model depends on your workload, budget, and integration constraints — let's figure it out together.