Zum Inhalt springen

Aktualisiert vor 1 MinuteQuellen:LiveBench Data Analysis

/ Live Benchmarks / Datenanalyse

Datenanalyse-Benchmarks

Strukturierte Dateninterpretation, Abfragen und Analyse aus LiveBench.

LiveBench Data Analysis

Originalquelle ansehen →
#ModelScore
1
GPT-5.5 Thinking xHigh EffortOpenAI
81.1%
2
GPT-5.4 Thinking xHigh EffortOpenAI
79.3%
3
Gemini 3.1 Pro Preview HighGoogle
78.5%
4
Claude 4.8 Opus Thinking xHigh EffortAnthropic
78.3%
5
Claude 4.7 Opus Thinking xHigh EffortAnthropic
78.3%
6
GPT-5.2 CodexOpenAI
78.2%
7
GPT-5.2 HighOpenAI
78.2%
8
Claude 4.6 Sonnet Thinking Medium EffortAnthropic
78.0%
9
Gemini 3 Flash Preview HighGoogle
74.8%
10
DeepSeek V4 ProDeepSeek
74.5%
11
Claude 4.5 Opus Thinking High EffortAnthropic
74.4%
12
Gemini 3 Pro Preview HighGoogle
74.4%
13
Qwen 3.7 MaxAlibaba
71.8%
14
GPT-5.4 Mini xHighOpenAI
71.0%
15
Qwen 3.6 27BAlibaba
70.4%
16
GPT-5.1 Codex Max HighOpenAI
70.1%
17
Qwen 3.6 PlusAlibaba
69.9%
18
Claude 4.6 Opus Thinking High EffortAnthropic
69.9%
19
GPT-5.1 HighOpenAI
69.6%
20
DeepSeek V4 FlashDeepSeek
68.0%
21
GLM 5Z.AI
67.9%
22
GPT-5.4 Nano xHighOpenAI
67.6%
23
Kimi K2.6 ThinkingMoonshot AI
65.1%
24
Gemini 3.5 Flash HighGoogle
64.9%
25
Grok 4xAI
63.4%
26
GLM 5.1Z.AI
63.2%
27
Grok 4.20 BetaxAI
62.9%
28
GPT-5.3 Codex HighOpenAI
62.7%
29
Kimi K2.5 ThinkingMoonshot AI
61.4%
30
Gemini 2.5 Flash (Max Thinking) (2025-09-25)Google
61.0%
31
GPT-5.1 CodexOpenAI
60.8%
32
Claude Haiku 4.5 ThinkingAnthropic
59.3%
33
Qwen 3.6 FlashAlibaba
58.8%
34
Gemma 4 31BGoogle
58.8%
35
GPT-5 ProOpenAI
57.0%
36
Claude Sonnet 4.5 ThinkingAnthropic
57.0%
37
Minimax M2.7MiniMax
56.3%
38
Grok 4.3xAI
55.8%
39
GPT-5 Mini HighOpenAI
55.2%
40
GLM 4.7Z.AI
55.2%
41
Gemini 3.1 Flash Lite Preview HighGoogle
54.9%
42
Claude 4 Sonnet ThinkingAnthropic
54.6%
43
GLM 5V TurboZ.AI
54.1%
44
Qwen 3 Next 80B A3B ThinkingAlibaba
53.6%
45
Kimi K2 ThinkingMoonshot AI
52.3%
46
Grok 4.1 FastxAI
52.2%
47
Qwen 3 235B A22B Thinking 2507Alibaba
52.2%
48
GLM 4.6Z.AI
52.0%
49
Gemini 2.5 Pro (Max Thinking)Google
51.6%
50
DeepSeek V3.2 Exp ThinkingDeepSeek
51.5%
51
DeepSeek V3.2 ThinkingDeepSeek
50.0%
52
Qwen 3 Next 80B A3B InstructAlibaba
49.8%
53
GPT-5.1 Codex MiniOpenAI
49.7%
54
Minimax M2.5MiniMax
49.6%
55
MiMo V2 ProXiaomi
49.2%
56
Grok Code FastxAI
49.0%
57
Claude 4.1 Opus ThinkingAnthropic
49.0%
58
GPT-5.3 InstantOpenAI
48.0%
59
Gemini 2.5 Flash Lite (Max Thinking) (2025-09-25)Google
47.9%
60
GPT-5.2 No ThinkingOpenAI
47.7%
61
Gemini 2.5 Flash (Max Thinking) (2025-06-05)Google
47.3%
62
Gemini 2.5 Flash Lite (Max Thinking) (2025-06-17)Google
47.0%
63
Claude Sonnet 4.5Anthropic
47.0%
64
Qwen 3 32BAlibaba
46.5%
65
GLM 4.6VZ.AI
46.4%
66
Claude 4.5 Opus Medium EffortAnthropic
45.5%
67
Claude 4.1 OpusAnthropic
45.4%
68
Claude Haiku 4.5Anthropic
45.1%
69
DeepSeek V3.2DeepSeek
45.0%
70
Qwen 3 30B A3BAlibaba
44.9%
71
Qwen 3 235B A22B Instruct 2507Alibaba
44.7%
72
DeepSeek V3.2 ExpDeepSeek
44.3%
73
Claude 4 SonnetAnthropic
44.1%
74
GPT-5.1 No ThinkingOpenAI
44.1%
75
Grok 4.20 Beta (Non-Reasoning)xAI
43.5%
76
GPT-5 Nano HighOpenAI
43.4%
77
Kimi K2 InstructMoonshot AI
43.3%
78
Grok 4.1 Fast (Non-Reasoning)xAI
40.6%
79
Trinity Large PreviewArcee AI
40.3%
80
Devstral 2Mistral
39.1%
81
GPT OSS 120bOpenAI
38.8%
82
Elephant AlphaOpenRouter
38.5%
83
Nemotron 3 Super 120B A12BNVIDIA
21.2%

/ Live Benchmarks

Brauchen Sie Hilfe bei der Auswahl des richtigen KI-Modells?

Benchmarks sind ein Ausgangspunkt, keine Antwort. Das richtige Modell hängt von Ihrem Workload, Budget und Ihren Integrations-Anforderungen ab – lassen Sie es uns gemeinsam herausfinden.