Data analysis benchmarks

Structured data interpretation, querying, and analysis from LiveBench.

LiveBench Data Analysis

#	Model	Score	Input $/M	Output $/M	Context	CI
1	GPT-5.5 Thinking xHigh EffortOpenAI	81.1%	—	—	—	—
2	Claude Fable 5 Thinking xHigh Effort*losing out due to stricter content moderationAnthropic	80.0%	—	—	—	—
3	GPT-5.4 Thinking xHigh EffortOpenAI	79.3%	—	—	—	—
4	Gemini 3.1 Pro Preview HighGoogle	78.5%	—	—	—	—
5	Claude 4.8 Opus Thinking xHigh EffortAnthropic	78.3%	—	—	—	—
6	Claude 4.7 Opus Thinking xHigh EffortAnthropic	78.3%	—	—	—	—
7	GPT-5.2 CodexOpenAI	78.2%	—	—	—	—
8	GPT-5.2 HighOpenAI	78.2%	—	—	—	—
9	Claude 4.6 Sonnet Thinking Medium EffortAnthropic	78.0%	—	—	—	—
10	Minimax M3MiniMax	76.2%	—	—	—	—
11	Gemini 3 Flash Preview HighGoogle	74.8%	—	—	—	—
12	DeepSeek V4 ProDeepSeek	74.5%	—	—	—	—
13	Claude 4.5 Opus Thinking High EffortAnthropic	74.4%	—	—	—	—
14	Gemini 3 Pro Preview HighGoogle	74.4%	—	—	—	—
15	GLM 5.2Z.AI	73.7%	—	—	—	—
16	Claude Sonnet 5 xHigh EffortAnthropic	72.5%	—	—	—	—
17	Qwen 3.7 MaxAlibaba	71.8%	—	—	—	—
18	GPT-5.4 Mini xHighOpenAI	71.0%	—	—	—	—
19	Grok Build 0.1xAI	70.8%	—	—	—	—
20	Qwen 3.6 27BAlibaba	70.4%	—	—	—	—
21	GPT-5.1 Codex Max HighOpenAI	70.1%	—	—	—	—
22	Qwen 3.6 PlusAlibaba	69.9%	—	—	—	—
23	Claude 4.6 Opus Thinking High EffortAnthropic	69.9%	—	—	—	—
24	GPT-5.1 HighOpenAI	69.6%	—	—	—	—
25	DeepSeek V4 FlashDeepSeek	68.0%	—	—	—	—
26	GLM 5Z.AI	67.9%	—	—	—	—
27	GPT-5.4 Nano xHighOpenAI	67.6%	—	—	—	—
28	Kimi K2.6 ThinkingMoonshot AI	65.1%	—	—	—	—
29	Gemini 3.5 Flash HighGoogle	64.9%	—	—	—	—
30	Grok 4xAI	63.4%	—	—	—	—
31	GLM 5.1Z.AI	63.2%	—	—	—	—
32	Grok 4.20 BetaxAI	62.9%	—	—	—	—
33	GPT-5.3 Codex HighOpenAI	62.7%	—	—	—	—
34	Kimi K2.7 CodeMoonshot AI	62.7%	—	—	—	—
35	Kimi K2.5 ThinkingMoonshot AI	61.4%	—	—	—	—
36	Gemini 2.5 Flash (Max Thinking) (2025-09-25)Google	61.0%	—	—	—	—
37	GPT-5.1 CodexOpenAI	60.8%	—	—	—	—
38	Claude Haiku 4.5 ThinkingAnthropic	59.3%	—	—	—	—
39	Qwen 3.6 FlashAlibaba	58.8%	—	—	—	—
40	Gemma 4 31BGoogle	58.8%	—	—	—	—
41	GPT-5 ProOpenAI	57.0%	—	—	—	—
42	Claude Sonnet 4.5 ThinkingAnthropic	57.0%	—	—	—	—
43	Minimax M2.7MiniMax	56.3%	—	—	—	—
44	Grok 4.3xAI	55.8%	—	—	—	—
45	GPT-5 Mini HighOpenAI	55.2%	—	—	—	—
46	GLM 4.7Z.AI	55.2%	—	—	—	—
47	Gemini 3.1 Flash Lite Preview HighGoogle	54.9%	—	—	—	—
48	Claude 4 Sonnet ThinkingAnthropic	54.6%	—	—	—	—
49	GLM 5V TurboZ.AI	54.1%	—	—	—	—
50	Qwen 3 Next 80B A3B ThinkingAlibaba	53.6%	—	—	—	—
51	Kimi K2 ThinkingMoonshot AI	52.3%	—	—	—	—
52	Grok 4.1 FastxAI	52.2%	—	—	—	—
53	Qwen 3 235B A22B Thinking 2507Alibaba	52.2%	—	—	—	—
54	GLM 4.6Z.AI	52.0%	—	—	—	—
55	Gemini 2.5 Pro (Max Thinking)Google	51.6%	—	—	—	—
56	DeepSeek V3.2 Exp ThinkingDeepSeek	51.5%	—	—	—	—
57	DeepSeek V3.2 ThinkingDeepSeek	50.0%	—	—	—	—
58	Qwen 3 Next 80B A3B InstructAlibaba	49.8%	—	—	—	—
59	GPT-5.1 Codex MiniOpenAI	49.7%	—	—	—	—
60	Minimax M2.5MiniMax	49.6%	—	—	—	—
61	MiMo V2 ProXiaomi	49.2%	—	—	—	—
62	Grok Code FastxAI	49.0%	—	—	—	—
63	Claude 4.1 Opus ThinkingAnthropic	49.0%	—	—	—	—
64	GPT-5.3 InstantOpenAI	48.0%	—	—	—	—
65	Gemini 2.5 Flash Lite (Max Thinking) (2025-09-25)Google	47.9%	—	—	—	—
66	GPT-5.2 No ThinkingOpenAI	47.7%	—	—	—	—
67	Gemini 2.5 Flash (Max Thinking) (2025-06-05)Google	47.3%	—	—	—	—
68	Gemini 2.5 Flash Lite (Max Thinking) (2025-06-17)Google	47.0%	—	—	—	—
69	Claude Sonnet 4.5Anthropic	47.0%	—	—	—	—
70	Qwen 3 32BAlibaba	46.5%	—	—	—	—
71	GLM 4.6VZ.AI	46.4%	—	—	—	—
72	Claude 4.5 Opus Medium EffortAnthropic	45.5%	—	—	—	—
73	Claude 4.1 OpusAnthropic	45.4%	—	—	—	—
74	Claude Haiku 4.5Anthropic	45.1%	—	—	—	—
75	DeepSeek V3.2DeepSeek	45.0%	—	—	—	—
76	Qwen 3 30B A3BAlibaba	44.9%	—	—	—	—
77	Qwen 3 235B A22B Instruct 2507Alibaba	44.7%	—	—	—	—
78	DeepSeek V3.2 ExpDeepSeek	44.3%	—	—	—	—
79	Claude 4 SonnetAnthropic	44.1%	—	—	—	—
80	GPT-5.1 No ThinkingOpenAI	44.1%	—	—	—	—
81	Grok 4.20 Beta (Non-Reasoning)xAI	43.5%	—	—	—	—
82	GPT-5 Nano HighOpenAI	43.4%	—	—	—	—
83	Kimi K2 InstructMoonshot AI	43.3%	—	—	—	—
84	Nemotron 3 Ultra 550B A55BNVIDIA	42.0%	—	—	—	—
85	Grok 4.1 Fast (Non-Reasoning)xAI	40.6%	—	—	—	—
86	Trinity Large PreviewArcee AI	40.3%	—	—	—	—
87	Devstral 2Mistral	39.1%	—	—	—	—
88	GPT OSS 120bOpenAI	38.8%	—	—	—	—
89	Elephant AlphaOpenRouter	38.5%	—	—	—	—
90	Nemotron 3 Super 120B A12BNVIDIA	21.2%	—	—	—	—

/ Live Benchmarks

Need help choosing the right AI model for your business?

Benchmarks are a starting point, not an answer. The right model depends on your workload, budget, and integration constraints — let's figure it out together.

Get in touch →