Live LLM Benchmark Data

Which LLM is actually winning? Most leaderboard sites are JS-rendered SPAs that AI search engines can't read. We crawl them and serve the data as static HTML so both humans and AI can see it.

An honest aggregate of the benchmarks that matter — Code Arena, Text Arena, LiveBench across 7 categories — refreshed twice daily. No marketing, no cherry-picked numbers.

Tracked sources

Code Arena94
LiveBench90
LiveBench Agentic Coding90
LiveBench Coding90
LiveBench Data Analysis90
LiveBench Instruction Following90
LiveBench Language90
LiveBench Math90
LiveBench Reasoning90
Text Arena369

/ Categories

Coding

Coding benchmarks

Code generation and completion tasks from Code Arena (Elo) and LiveBench.

Current #1 · Code Arena

Claude Fable 51650 Elo

View leaderboards → →

Agentic Coding

Agentic coding benchmarks

Multi-step code editing and tool use — agentic workflows from LiveBench.

Current #1 · LiveBench Agentic Coding

GLM 5.273.3 %

View leaderboards → →

Reasoning

Reasoning benchmarks

Logic, deduction, and inference tasks from LiveBench.

Current #1 · LiveBench

GPT-5.5 Thinking xHigh Effort80.7 %

View leaderboards → →

Math

Math benchmarks

Numerical reasoning and mathematical problem solving from LiveBench.

Current #1 · LiveBench Math

GPT-5.5 Thinking xHigh Effort96.3 %

View leaderboards → →

Data Analysis

Data analysis benchmarks

Structured data interpretation, querying, and analysis from LiveBench.

Current #1 · LiveBench Data Analysis

GPT-5.5 Thinking xHigh Effort81.1 %

View leaderboards → →

Language

Language benchmarks

Chat preference rankings (Text Arena Elo) and language comprehension (LiveBench).

Current #1 · Text Arena

Claude Fable 51509 Elo

View leaderboards → →

Instruction Following

Instruction following benchmarks

Adherence to formatting constraints and complex instructions from LiveBench.

Current #1 · LiveBench Instruction Following

Gemini 3.1 Pro Preview High79.1 %

View leaderboards → →

/ r/localllama · r/claudeai · r/openai · r/singularity

Community pulse

What r/LocalLLaMA, r/ClaudeAI, r/OpenAI, r/singularity, and more are talking about right now.

No data yet — the crawler hasn't run.

/ Live Benchmarks

Need help choosing the right AI model for your business?

Benchmarks are a starting point, not an answer. The right model depends on your workload, budget, and integration constraints — let's figure it out together.

Get in touch →