Updated 1 minute agoSources:Code ArenaLiveBench Coding
/ Live Benchmarks / Coding
Coding benchmarks
Code generation and completion tasks from Code Arena (Elo) and LiveBench.
Code Arena
View original source →| # | Model | Score | Input $/M | Output $/M | Context | Votes |
|---|---|---|---|---|---|---|
| 1 | Claude Opus 4 7 ThinkingAnthropic | 1567Elo | $5.00 | $25 | 1M | 5.3K |
| 2 | Claude Opus 4 7Anthropic | 1562Elo | $5.00 | $25 | 1M | 4.9K |
| 3 | Claude Opus 4.6 ThinkingAnthropic | 1542Elo | $5.00 | $25 | 1M | 7.9K |
| 4 | Qwen3.7 Max 20260517Alibaba | 1541Elo | $2.50 | $7.50 | 1M | 1.5K |
| 5 | Claude Opus 4.6Anthropic | 1538Elo | $5.00 | $25 | 1M | 8.9K |
| 6 | Glm 5.1Z.ai | 1533Elo | $1.40 | $4.40 | 203K | 3.6K |
| 7 | Claude Sonnet 4.6Anthropic | 1523Elo | $3.00 | $15 | 1M | 11.1K |
| 8 | Kimi K2.6Moonshot | 1518Elo | $0.95 | $4.00 | 262K | 4.0K |
| 9 | Muse SparkMeta | 1508Elo | — | — | — | 1.6K |
| 10 | Gemini 3.5 FlashGoogle | 1506Elo | $1.50 | $9.00 | 1M | 2.2K |
| 11 | 1505Elo | — | — | — | 4.1K | |
| 12 | Claude Opus 4.5 ThinkingAnthropic | 1490Elo | $5.00 | $25 | 200K | 13.1K |
| 13 | Qwen3.6 Max PreviewAlibaba | 1486Elo | $1.04 | $6.24 | 262K | 2.5K |
| 14 | 1479Elo | — | — | — | 4.3K | |
| 15 | Mimo v2.5 ProXiaomi | 1471Elo | $1.00 | $3.00 | 1M | 4.7K |
| 16 | Claude Opus 4.5Anthropic | 1467Elo | $5.00 | $25 | 200K | 15.3K |
| 17 | Deepseek v4 Pro ThinkingDeepSeek | 1464Elo | $0.43 | $0.87 | 1M | 4.0K |
| 18 | Qwen3.6 PlusAlibaba | 1460Elo | $0.33 | $1.95 | 1M | 6.1K |
| 19 | 1457Elo | $2.50 | $15 | 1M | 1.5K | |
| 20 | Gemini 3.1 ProGoogle | 1448Elo | $2.00 | $12 | 1M | 10.3K |
| 21 | gpt-5.5 (codex-harness)OpenAI | 1444Elo | — | — | — | 4.1K |
| 22 | GLM-4.7Z.ai | 1440Elo | $0.40 | $1.75 | 203K | 4.9K |
| 23 | Mimo v2.5Xiaomi | 1440Elo | $0.40 | $2.00 | 1M | 3.7K |
| 24 | Gemini 3 ProGoogle | 1438Elo | $2.00 | $12 | 1M | 17.2K |
| 25 | 1437Elo | $2.50 | $15 | 1M | 1.4K | |
| 26 | Gemini 3 FlashGoogle | 1437Elo | $0.50 | $3.00 | 1M | 13.3K |
| 27 | GLM-5Z.ai | 1436Elo | $1.00 | $3.20 | 203K | 6.6K |
| 28 | MiMo V2 ProXiaomi | 1434Elo | $1.00 | $3.00 | 1M | 6.8K |
| 29 | Kimi K2.5 ThinkingMoonshot | 1431Elo | $0.60 | $3.00 | — | 10.7K |
| 30 | Kimi K2.5 InstantMoonshot | 1408Elo | $0.40 | $1.90 | 262K | 3.6K |
| 31 | 1407Elo | $1.75 | $14 | 400K | 3.0K | |
| 32 | GPT-5.2OpenAI | 1404Elo | $1.75 | $14 | 400K | 1.5K |
| 33 | GPT-5.4 MiniOpenAI | 1402Elo | $0.75 | $4.50 | 400K | 5.5K |
| 34 | MiniMax M2.7MiniMax | 1401Elo | $0.28 | $1.20 | 205K | 6.3K |
| 35 | 1395Elo | $2.00 | $6.00 | 2M | 7.2K | |
| 36 | GPT-5 MediumOpenAI | 1394Elo | $1.25 | $10 | 400K | 3.8K |
| 37 | Qwen 3.5 397BAlibaba | 1393Elo | $0.39 | $2.34 | 262K | 9.7K |
| 38 | MiniMax M2.1MiniMax | 1392Elo | $0.29 | $0.95 | 205K | 9.3K |
| 39 | GPT-5.1 MediumOpenAI | 1391Elo | $1.25 | $10 | 400K | 6.1K |
| 40 | Gpt 5.4OpenAI | 1388Elo | $2.50 | $15 | 1M | 239 |
| 41 | Claude Sonnet 4.5 ThinkingAnthropic | 1388Elo | $3.00 | $15 | 200K | 15.7K |
| 42 | 1387Elo | $0.50 | $3.00 | 1M | 16.4K | |
| 43 | Claude Sonnet 4.5Anthropic | 1386Elo | $3.00 | $15 | 200K | 18.4K |
| 44 | Claude Opus 4.1Anthropic | 1386Elo | $15 | $75 | 200K | 8.6K |
| 45 | MiniMax M2.5MiniMax | 1382Elo | $0.15 | $1.15 | 205K | 7.8K |
| 46 | Gemma 4 31bGoogle | 1380Elo | $0.14 | $0.40 | 262K | 3.4K |
| 47 | Grok 4.3xAI | 1377Elo | $1.25 | $2.50 | 1M | 3.5K |
| 48 | 1373Elo | $1.75 | $14 | 400K | 3.5K | |
| 49 | DeepSeek V3.2 ThinkingDeepSeek | 1368Elo | $0.25 | $0.38 | 131K | 7.9K |
| 50 | Hunyuan Hy3 PreviewTencent | 1365Elo | — | — | — | 1.3K |
| 51 | Qwen 3.5 122BAlibaba | 1365Elo | $0.26 | $2.08 | 262K | 8.1K |
| 52 | Gemma 4 26b A4bGoogle | 1360Elo | — | — | — | 1.5K |
| 53 | Qwen 3.5 27BAlibaba | 1358Elo | $0.20 | $1.56 | 262K | 7.7K |
| 54 | GLM-4.6Z.ai | 1355Elo | $0.43 | $1.74 | 203K | 8.3K |
| 55 | GPT-5.1OpenAI | 1340Elo | $1.25 | $10 | 400K | 12.9K |
| 56 | 1337Elo | $0.10 | $0.30 | 262K | 6.7K | |
| 57 | GPT-5.2 CodexOpenAI | 1335Elo | $1.75 | $14 | 400K | 7.8K |
| 58 | DeepSeek V3.2DeepSeek | 1332Elo | $0.25 | $0.38 | 131K | 10.5K |
| 59 | Kimi K2 TurboMoonshot | 1329Elo | $1.15 | $8.00 | 262K | 15.3K |
| 60 | Gpt 5.1 CodexOpenAI | 1329Elo | $1.25 | $10 | 400K | 6.2K |
| 61 | Claude Haiku 4.5Anthropic | 1322Elo | $1.00 | $5.00 | 200K | 20.6K |
| 62 | MiniMax M2MiniMax | 1305Elo | $0.26 | $1.00 | 205K | 8.4K |
| 63 | mimo-v2-flash (thinking)Xiaomi | 1300Elo | $0.10 | $0.30 | 262K | 2.1K |
| 64 | Deepseek v3.2 ExpDeepSeek | 1287Elo | $0.27 | $0.41 | 164K | 4.9K |
| 65 | Qwen 3 CoderAlibaba | 1282Elo | $0.40 | $1.60 | 262K | 15.2K |
| 66 | KAT Coder Pro v1Kwai | 1259Elo | $0.21 | $0.83 | 256K | 1.9K |
| 67 | Qwen3.5 35b A3bAlibaba | 1249Elo | $0.14 | $1.00 | 262K | 1.8K |
| 68 | Gemini 3.1 Flash LiteGoogle | 1248Elo | $0.25 | $1.50 | 1M | 9.3K |
| 69 | Trinity Large ThinkingArcee AI | 1245Elo | $0.22 | $0.85 | 262K | 1.3K |
| 70 | Gpt 5.1 Codex MiniOpenAI | 1240Elo | $0.25 | $2.00 | 400K | 1.4K |
| 71 | Qwen3.5 FlashAlibaba | 1237Elo | — | — | — | 1.6K |
| 72 | 1234Elo | $0.20 | $0.50 | 2M | 6.9K | |
| 73 | Mistral Large 3Mistral | 1223Elo | $0.50 | $1.50 | — | 1.0K |
| 74 | 1209Elo | — | — | — | 1.2K | |
| 75 | Gemini 2.5 ProGoogle | 1204Elo | $1.25 | $10 | 1M | 3.3K |
| 76 | 1202Elo | $0.05 | $0.10 | 131K | 1.7K | |
| 77 | Devstral 2Mistral | 1199Elo | — | — | — | 1.6K |
| 78 | Mercury 2Inception AI | 1165Elo | $0.25 | $0.75 | 128K | 946 |
| 79 | 1150Elo | $0.20 | $0.50 | 2M | 933 | |
| 80 | 1140Elo | $0.20 | $1.50 | — | 982 | |
| 81 | Devstral Medium 2507Mistral | 1091Elo | $0.40 | $2.00 | 128K | 992 |
LiveBench Coding
View original source →| # | Model | Score | Input $/M | Output $/M | Context | CI |
|---|---|---|---|---|---|---|
| 1 | GPT-5.2 CodexOpenAI | 83.6% | — | — | — | — |
| 2 | GPT-5.5 Thinking xHigh EffortOpenAI | 82.5% | — | — | — | — |
| 3 | Claude 4.7 Opus Thinking xHigh EffortAnthropic | 82.1% | — | — | — | — |
| 4 | Claude 4 SonnetAnthropic | 80.7% | — | — | — | — |
| 5 | GPT-5.1 Codex Max HighOpenAI | 80.7% | — | — | — | — |
| 6 | Claude Sonnet 4.5 ThinkingAnthropic | 80.4% | — | — | — | — |
| 7 | Claude 4.5 Opus Thinking High EffortAnthropic | 79.7% | — | — | — | — |
| 8 | Claude 4.6 Sonnet Thinking Medium EffortAnthropic | 79.3% | — | — | — | — |
| 9 | GPT-5.3 InstantOpenAI | 78.6% | — | — | — | — |
| 10 | Kimi K2.6 ThinkingMoonshot AI | 78.6% | — | — | — | — |
| 11 | Claude 4.5 Opus Medium EffortAnthropic | 78.5% | — | — | — | — |
| 12 | Claude 4.6 Opus Thinking High EffortAnthropic | 78.2% | — | — | — | — |
| 13 | Gemini 3.5 Flash HighGoogle | 78.2% | — | — | — | — |
| 14 | GPT-5.3 Codex HighOpenAI | 78.2% | — | — | — | — |
| 15 | Qwen 3.6 PlusAlibaba | 78.2% | — | — | — | — |
| 16 | Kimi K2.5 ThinkingMoonshot AI | 77.9% | — | — | — | — |
| 17 | GPT-5.4 Thinking xHigh EffortOpenAI | 77.5% | — | — | — | — |
| 18 | Claude 4 Sonnet ThinkingAnthropic | 77.5% | — | — | — | — |
| 19 | GPT-5.1 No ThinkingOpenAI | 77.5% | — | — | — | — |
| 20 | Gemini 3.1 Pro Preview HighGoogle | 76.5% | — | — | — | — |
| 21 | GPT-5.2 No ThinkingOpenAI | 76.5% | — | — | — | — |
| 22 | GPT-5.2 HighOpenAI | 76.1% | — | — | — | — |
| 23 | Claude 4.1 OpusAnthropic | 76.1% | — | — | — | — |
| 24 | Claude Sonnet 4.5Anthropic | 76.1% | — | — | — | — |
| 25 | Gemini 2.5 Pro (Max Thinking)Google | 75.7% | — | — | — | — |
| 26 | DeepSeek V3.2DeepSeek | 75.7% | — | — | — | — |
| 27 | GLM 5.1Z.AI | 75.4% | — | — | — | — |
| 28 | Claude 4.1 Opus ThinkingAnthropic | 74.7% | — | — | — | — |
| 29 | Gemini 3 Pro Preview HighGoogle | 74.6% | — | — | — | — |
| 30 | Kimi K2 InstructMoonshot AI | 74.3% | — | — | — | — |
| 31 | Qwen 3.7 MaxAlibaba | 74.2% | — | — | — | — |
| 32 | Gemini 3 Flash Preview HighGoogle | 73.9% | — | — | — | — |
| 33 | GLM 5V TurboZ.AI | 73.9% | — | — | — | — |
| 34 | GLM 5Z.AI | 73.6% | — | — | — | — |
| 35 | DeepSeek V3.2 ExpDeepSeek | 73.2% | — | — | — | — |
| 36 | Grok 4xAI | 73.1% | — | — | — | — |
| 37 | GLM 4.7Z.AI | 73.1% | — | — | — | — |
| 38 | Claude Haiku 4.5 ThinkingAnthropic | 72.8% | — | — | — | — |
| 39 | GPT-5.1 HighOpenAI | 72.5% | — | — | — | — |
| 40 | Claude Haiku 4.5Anthropic | 72.2% | — | — | — | — |
| 41 | GPT-5.4 Nano xHighOpenAI | 72.1% | — | — | — | — |
| 42 | GPT-5 ProOpenAI | 72.1% | — | — | — | — |
| 43 | GPT-5.1 CodexOpenAI | 71.8% | — | — | — | — |
| 44 | Qwen 3.6 27BAlibaba | 71.8% | — | — | — | — |
| 45 | GPT-5.4 Mini xHighOpenAI | 71.6% | — | — | — | — |
| 46 | GLM 4.6Z.AI | 71.0% | — | — | — | — |
| 47 | Minimax M2.5Minimax | 70.7% | — | — | — | — |
| 48 | DeepSeek V3.2 Exp ThinkingDeepSeek | 70.1% | — | — | — | — |
| 49 | DeepSeek V4 ProDeepSeek | 70.0% | — | — | — | — |
| 50 | Grok 4.3xAI | 69.9% | — | — | — | — |
| 51 | GPT-5.1 Codex MiniOpenAI | 69.9% | — | — | — | — |
| 52 | Grok 4.1 FastxAI | 69.6% | — | — | — | — |
| 53 | Qwen 3 235B A22B Instruct 2507Alibaba | 69.6% | — | — | — | — |
| 54 | DeepSeek V4 FlashDeepSeek | 69.2% | — | — | — | — |
| 55 | Qwen 3 235B A22B Thinking 2507Alibaba | 69.0% | — | — | — | — |
| 56 | MiMo V2 ProXiaomi | 68.8% | — | — | — | — |
| 57 | Gemini 3.1 Flash Lite Preview HighGoogle | 68.5% | — | — | — | — |
| 58 | GPT-5 Mini HighOpenAI | 68.2% | — | — | — | — |
| 59 | Qwen 3 Next 80B A3B InstructAlibaba | 68.2% | — | — | — | — |
| 60 | Gemini 2.5 Flash (Max Thinking) (2025-09-25)Google | 67.5% | — | — | — | — |
| 61 | Kimi K2 ThinkingMoonshot AI | 67.4% | — | — | — | — |
| 62 | Devstral 2Mistral | 66.8% | — | — | — | — |
| 63 | Gemini 2.5 Flash Lite (Max Thinking) (2025-06-17)Google | 66.4% | — | — | — | — |
| 64 | Grok 4.20 BetaxAI | 66.1% | — | — | — | — |
| 65 | Gemini 2.5 Flash (Max Thinking) (2025-06-05)Google | 66.0% | — | — | — | — |
| 66 | Qwen 3 32BAlibaba | 66.0% | — | — | — | — |
| 67 | Trinity Large PreviewArcee | 65.7% | — | — | — | — |
| 68 | Gemini 2.5 Flash Lite (Max Thinking) (2025-09-25)Google | 65.4% | — | — | — | — |
| 69 | Qwen 3.6 FlashAlibaba | 64.9% | — | — | — | — |
| 70 | DeepSeek V3.2 ThinkingDeepSeek | 64.6% | — | — | — | — |
| 71 | Grok Code FastxAI | 64.4% | — | — | — | — |
| 72 | GLM 4.6VZ.AI | 64.2% | — | — | — | — |
| 73 | GPT-5 Nano HighOpenAI | 62.4% | — | — | — | — |
| 74 | Qwen 3 Next 80B A3B ThinkingAlibaba | 60.7% | — | — | — | — |
| 75 | Gemma 4 31BGoogle | 60.3% | — | — | — | — |
| 76 | GPT OSS 120bOpenAI | 60.2% | — | — | — | — |
| 77 | Grok 4.20 Beta (Non-Reasoning)xAI | 58.5% | — | — | — | — |
| 78 | Elephant AlphaOpenRouter | 56.7% | — | — | — | — |
| 79 | Minimax M2.7Minimax | 54.9% | — | — | — | — |
| 80 | Grok 4.1 Fast (Non-Reasoning)xAI | 54.3% | — | — | — | — |
| 81 | Nemotron 3 Super 120B A12BNVIDIA | 54.1% | — | — | — | — |
| 82 | Qwen 3 30B A3BAlibaba | 48.9% | — | — | — | — |
/ Live Benchmarks
Need help choosing the right AI model for your business?
Benchmarks are a starting point, not an answer. The right model depends on your workload, budget, and integration constraints — let's figure it out together.