Aktualisiert vor 1 MinuteQuellen:LiveBench Math
/ Live Benchmarks / Mathematik
Mathematik-Benchmarks
Numerisches Reasoning und mathematische Problemlösung aus LiveBench.
LiveBench Math
Originalquelle ansehen →| # | Model | Score | Input $/M | Output $/M | Context | CI |
|---|---|---|---|---|---|---|
| 1 | GPT-5.5 Thinking xHigh EffortOpenAI | 96.3% | — | — | — | — |
| 2 | GPT-5.4 Thinking xHigh EffortOpenAI | 94.2% | — | — | — | — |
| 3 | GPT-5.2 HighOpenAI | 93.2% | — | — | — | — |
| 4 | Claude 4.7 Opus Thinking xHigh EffortAnthropic | 93.1% | — | — | — | — |
| 5 | GPT-5.4 Nano xHighOpenAI | 91.3% | — | — | — | — |
| 6 | Gemini 3.1 Pro Preview HighGoogle | 91.0% | — | — | — | — |
| 7 | DeepSeek V4 ProDeepSeek | 90.7% | — | — | — | — |
| 8 | Claude 4.5 Opus Thinking High EffortAnthropic | 90.4% | — | — | — | — |
| 9 | Claude 4.6 Opus Thinking High EffortAnthropic | 89.3% | — | — | — | — |
| 10 | GPT-5.2 CodexOpenAI | 88.8% | — | — | — | — |
| 11 | Gemini 3.5 Flash HighGoogle | 88.2% | — | — | — | — |
| 12 | GPT-5.3 Codex HighOpenAI | 87.8% | — | — | — | — |
| 13 | Grok 4.20 BetaxAI | 87.1% | — | — | — | — |
| 14 | Claude 4.6 Sonnet Thinking Medium EffortAnthropic | 87.0% | — | — | — | — |
| 15 | GPT-5.1 HighOpenAI | 86.9% | — | — | — | — |
| 16 | GPT-5 ProOpenAI | 86.2% | — | — | — | — |
| 17 | Qwen 3.7 MaxAlibaba | 85.3% | — | — | — | — |
| 18 | DeepSeek V3.2 ThinkingDeepSeek | 85.0% | — | — | — | — |
| 19 | GLM 5.1Z.AI | 84.9% | — | — | — | — |
| 20 | Kimi K2.5 ThinkingMoonshot AI | 84.9% | — | — | — | — |
| 21 | Grok 4.3xAI | 84.3% | — | — | — | — |
| 22 | Kimi K2.6 ThinkingMoonshot AI | 84.3% | — | — | — | — |
| 23 | Gemini 3 Flash Preview HighGoogle | 84.2% | — | — | — | — |
| 24 | Qwen 3.6 PlusAlibaba | 83.7% | — | — | — | — |
| 25 | Grok 4.1 FastxAI | 83.7% | — | — | — | — |
| 26 | GLM 5Z.AI | 83.5% | — | — | — | — |
| 27 | GPT-5.1 Codex Max HighOpenAI | 83.2% | — | — | — | — |
| 28 | Grok 4xAI | 83.0% | — | — | — | — |
| 29 | DeepSeek V3.2 Exp ThinkingDeepSeek | 82.4% | — | — | — | — |
| 30 | GPT-5 Mini HighOpenAI | 82.2% | — | — | — | — |
| 31 | Gemini 3 Pro Preview HighGoogle | 81.8% | — | — | — | — |
| 32 | GLM 4.6Z.AI | 81.1% | — | — | — | — |
| 33 | Kimi K2 ThinkingMoonshot AI | 81.1% | — | — | — | — |
| 34 | Minimax M2.7Minimax | 80.5% | — | — | — | — |
| 35 | Qwen 3.6 27BAlibaba | 79.9% | — | — | — | — |
| 36 | DeepSeek V4 FlashDeepSeek | 79.7% | — | — | — | — |
| 37 | GPT-5.1 CodexOpenAI | 79.6% | — | — | — | — |
| 38 | Claude Sonnet 4.5 ThinkingAnthropic | 79.3% | — | — | — | — |
| 39 | Qwen 3.6 FlashAlibaba | 78.9% | — | — | — | — |
| 40 | GPT-5.4 Mini xHighOpenAI | 78.6% | — | — | — | — |
| 41 | Claude Haiku 4.5 ThinkingAnthropic | 77.5% | — | — | — | — |
| 42 | Minimax M2.5Minimax | 77.4% | — | — | — | — |
| 43 | MiMo V2 ProXiaomi | 77.0% | — | — | — | — |
| 44 | GPT-5.1 Codex MiniOpenAI | 76.3% | — | — | — | — |
| 45 | GLM 4.7Z.AI | 76.0% | — | — | — | — |
| 46 | Gemini 2.5 Flash (Max Thinking) (2025-09-25)Google | 75.3% | — | — | — | — |
| 47 | Qwen 3 Next 80B A3B ThinkingAlibaba | 74.3% | — | — | — | — |
| 48 | Gemma 4 31BGoogle | 73.9% | — | — | — | — |
| 49 | Gemini 3.1 Flash Lite Preview HighGoogle | 73.6% | — | — | — | — |
| 50 | Qwen 3 235B A22B Thinking 2507Alibaba | 73.4% | — | — | — | — |
| 51 | Claude 4.1 Opus ThinkingAnthropic | 73.2% | — | — | — | — |
| 52 | GPT-5.3 InstantOpenAI | 72.4% | — | — | — | — |
| 53 | Claude 4 Sonnet ThinkingAnthropic | 70.5% | — | — | — | — |
| 54 | GLM 5V TurboZ.AI | 70.4% | — | — | — | — |
| 55 | Qwen 3 Next 80B A3B InstructAlibaba | 70.2% | — | — | — | — |
| 56 | GPT OSS 120bOpenAI | 68.9% | — | — | — | — |
| 57 | Gemini 2.5 Flash (Max Thinking) (2025-06-05)Google | 68.8% | — | — | — | — |
| 58 | GPT-5 Nano HighOpenAI | 68.4% | — | — | — | — |
| 59 | Gemini 2.5 Pro (Max Thinking)Google | 68.3% | — | — | — | — |
| 60 | Qwen 3 235B A22B Instruct 2507Alibaba | 68.0% | — | — | — | — |
| 61 | Qwen 3 32BAlibaba | 67.4% | — | — | — | — |
| 62 | Claude 4.5 Opus Medium EffortAnthropic | 66.3% | — | — | — | — |
| 63 | Qwen 3 30B A3BAlibaba | 65.3% | — | — | — | — |
| 64 | Gemini 2.5 Flash Lite (Max Thinking) (2025-09-25)Google | 64.9% | — | — | — | — |
| 65 | DeepSeek V3.2 ExpDeepSeek | 64.4% | — | — | — | — |
| 66 | DeepSeek V3.2DeepSeek | 64.0% | — | — | — | — |
| 67 | Claude 4.1 OpusAnthropic | 62.8% | — | — | — | — |
| 68 | Claude Sonnet 4.5Anthropic | 62.6% | — | — | — | — |
| 69 | GLM 4.6VZ.AI | 62.5% | — | — | — | — |
| 70 | Gemini 2.5 Flash Lite (Max Thinking) (2025-06-17)Google | 61.0% | — | — | — | — |
| 71 | Claude 4 SonnetAnthropic | 60.4% | — | — | — | — |
| 72 | GPT-5.2 No ThinkingOpenAI | 58.3% | — | — | — | — |
| 73 | Kimi K2 InstructMoonshot AI | 58.1% | — | — | — | — |
| 74 | Claude Haiku 4.5Anthropic | 58.0% | — | — | — | — |
| 75 | Elephant AlphaOpenRouter | 57.5% | — | — | — | — |
| 76 | Grok Code FastxAI | 56.0% | — | — | — | — |
| 77 | Devstral 2Mistral | 52.5% | — | — | — | — |
| 78 | Grok 4.20 Beta (Non-Reasoning)xAI | 45.5% | — | — | — | — |
| 79 | Trinity Large PreviewArcee | 44.9% | — | — | — | — |
| 80 | GPT-5.1 No ThinkingOpenAI | 44.5% | — | — | — | — |
| 81 | Grok 4.1 Fast (Non-Reasoning)xAI | 38.9% | — | — | — | — |
| 82 | Nemotron 3 Super 120B A12BNVIDIA | 36.4% | — | — | — | — |
/ Live Benchmarks
Brauchen Sie Hilfe bei der Auswahl des richtigen KI-Modells?
Benchmarks sind ein Ausgangspunkt, keine Antwort. Das richtige Modell hängt von Ihrem Workload, Budget und Ihren Integrations-Anforderungen ab – lassen Sie es uns gemeinsam herausfinden.