Data updated 36 minutes agoSources:Code Arena·Text Arena·LiveBench·LiveCodeBench·Aider Polyglot
Live Benchmarks
Live LLM Benchmark Data
Which LLM is actually winning? Most leaderboard sites are JS-rendered SPAs that AI search engines can't read. We crawl them and serve the data as static HTML so both humans and AI can see it.
An honest aggregate of the benchmarks that matter — Code Arena, Text Arena, LiveBench, LiveCodeBench — refreshed hourly. No marketing, no cherry-picked numbers.
Tracked sources
- Aider Polyglot69 models
- Code Arena60 models
- LiveBench71 models
- LiveCodeBench28 models
- Text Arena339 models
- WebDev Arena10 models(static)
Coding
Coding benchmarks
Real code generation, repo-level fixes, and competitive programming.
Current #1 · Aider Polyglot
Reasoning
Reasoning benchmarks
Multi-step reasoning, math, and contamination-free language tasks.
Current #1 · LiveBench
General Chat
General chat benchmarks
Open-ended chat preference rankings from real user votes.
Current #1 · Text Arena
r/LocalLLaMA · r/ClaudeAI · r/OpenAI · r/singularity
Community pulse
What r/LocalLLaMA, r/ClaudeAI, r/OpenAI, r/singularity, and more are talking about right now.
New Yorker published a major investigation into Sam Altman and OpenAI today — based on never-before-disclosed internal memos and 100+ interviews
Something happened to Opus 4.6's reasoning effort
this is how an AI generated cow looked 12 years ago
Gemma 4 just casually destroyed every model on our leaderboard except Opus 4.6 and GPT-5.2. 31B params, $0.20/run
Opus 4.6 destroys a user’s session costing them real money
Need help choosing the right AI model for your business?
Benchmarks are a starting point, not an answer. The right model depends on your workload, budget, and integration constraints — let's figure it out together.