Every frontier model, ranked live

Capability scoring across 47 AI models, updated from LMSYS Arena, MMLU, HumanEval and our weighted composite. 3 releases this week.

UPDATED · APR 17 · 10:00 UTC

Scoring methodology →

#	Model	Score	24h	7d	Org value	Queries/day	Category
1	Claude 3.7 SonnetNEW Anthropic	94.2	2.1%	5.8%	$8.5B	142K	Frontier
2	GPT-4o OpenAI	91.8	0.3%	1.2%	$157B	891K	Frontier
3	Gemini 2.0 Ultra Google DeepMind	89.3	1.7%	4.1%	$1.8T	324K	Frontier
4	Grok 3 xAI	86.1	4.2%	9.3%	$24B	67K	Frontier
5	Llama 4 ScoutNEW Meta AI	81.4	6.8%	14.2%	$1.4T	2.1M	Open Source
6	Mistral Large 3 Mistral AI	79.2	3.1%	6.7%	$1.1B	89K	Open Source
7	o4-miniNEW OpenAI	78.1	8.4%	22.1%	$157B	234K	Reasoning
8	Claude 3.7 Haiku Anthropic	76.4	1.1%	3.2%	$8.5B	412K	Frontier
9	Gemini 2.0 Flash Google DeepMind	74.8	0.4%	0.9%	$1.8T	1.1M	Frontier
10	DeepSeek V3 DeepSeek	73.1	0.6%	2.8%	$8B	98K	Open Source
11	Qwen 3 72BNEW Alibaba	71.9	4.3%	11.2%	$210B	124K	Open Source
12	o3-pro OpenAI	70.6	1.2%	0.8%	$157B	34K	Reasoning
13	Command R+ Cohere	68.4	0.8%	2.1%	$5.5B	18K	Enterprise
14	Phi-4 Microsoft	66.2	0.3%	1.4%	$3.1T	42K	Open Source
15	Claude 3.5 Sonnet Anthropic	64.8	2.1%	5.4%	$8.5B	67K	Frontier

Methodology

How the capability score is computed

Weighted composite of LMSYS Arena Elo (40%), MMLU (20%), HumanEval (15%), GPQA (10%), ARC-AGI (10%), and community benchmark reports (5%). Raw scores are min-max normalized to 0–100 across frontier and open-source tiers. Updated every 6 hours via cron.

· 24h movement reflects Elo delta since previous day’s snapshot.
· Queries/day is estimated from public API telemetry + partner data.
· Org value uses the latest known private/public valuation.
· Phase 1 mock data — real feeds go live in Phase 2.