Token Pricing / Comparisons

Claude vs GPT (April 2026)

The honest comparison most "AI blog" posts dance around: actual per-task cost factoring in Claude's 35% tokenizer overhead, real speed numbers from public telemetry, and capability gaps that matter for production decisions. Updated as providers change rates.

Run your own prompt through the calculator →

Quick verdict

Use case	Recommended	Why
General chat / agent	Claude Sonnet 4.6 or GPT-4.1	Both balanced-tier. GPT-4.1 cheaper ($2/$8 vs $3/$15). Claude better at coding agents in current evals.
Hardest reasoning task	GPT-5.5 (reasoning) or Claude Opus 4.7	GPT-5.5 tops AA Index; Opus close behind, with 1M context as differentiator.
Cheap high-volume calls	GPT-5-nano ($0.05/$0.40)	No Claude equivalent at this price tier. Haiku 4.5 is 5–10× more expensive.
Latency-sensitive UX	GPT-4o-mini or Claude Haiku 4.5	GPT-4o-mini ~130 tok/s, Haiku 4.5 ~91 tok/s. Both viable; GPT typically faster.
Long context (> 400K tokens)	Claude Sonnet 4.6 (1M)	GPT-5 caps at 400K; Claude Sonnet 4.6 and Opus 4.7 are 1M.
Heavy prompt caching	Claude family	90% off cached reads (vs 50% on OpenAI). Worth the 25% write surcharge after 3 reads.

Headline prices

Per-million-token pricing (input/output), before tokenizer adjustment:

Tier	Anthropic	OpenAI
Reasoning (top)	—	GPT-5.5 (reasoning): $5/$30
Flagship	Claude Opus 4.7: $5/$25	GPT-5: $1.25/$10
Balanced	Claude Sonnet 4.6: $3/$15	GPT-4.1: $2/$8
Fast	Claude Haiku 4.5: $1/$5	GPT-4o Mini: $0.15/$0.6
Tiny	—	GPT-5 Nano: $0.05/$0.4

The tokenizer trap (why Claude is more expensive than it looks)

Claude 4.x ships a new BPE tokenizer. The same English text turns into roughly 35% more tokens than OpenAI's o200k_base tokenizer. The per-token sticker prices compare directly, but the per-task bill doesn't.

Worked example for a typical chat turn (1K prompt, 500 output):

Model	Tokens billed	Sticker cost	Real cost (tokenizer-adjusted)
GPT-5	1,000 in / 500 out	$0.0063	$0.0063
Claude Opus 4.7	1,350 in / 675 out	$0.0175	$0.0236
Claude Sonnet 4.6	1,350 in / 675 out	$0.0105	$0.0142
GPT-4.1	1,000 in / 500 out	$0.0060	$0.0060

At flagship-vs-flagship, GPT-5 costs $0.0063 per turn vs Claude Opus 4.7 at $0.0236 — a 4× cost gap, almost double the gap the sticker prices alone suggest.

Speed: GPT family is faster on average

Model	Tok/s (avg, public telemetry)
GPT-4o Mini	130
GPT-4.1 Mini	110
GPT-5	100
GPT-4o	95
GPT-4.1	90
Claude Haiku 4.5	91
GPT-5.5 (reasoning) (reasoning)	68
Claude Opus 4.7	49
Claude Sonnet 4.6	45

For latency-sensitive UX (chat-as-you-type, voice agents, autocomplete) the GPT family is generally a better pick. Claude's deliberate output style is great for thoughtful long-form responses but visibly slower in interactive contexts.

Intelligence: closer than the marketing suggests

Composite of public benchmarks (Artificial Analysis Index v4, lmarena ELO, MMLU/GPQA), rescaled 0–100:

Model	Intelligence
GPT-5.5 (reasoning)	88
Claude Opus 4.7	86
GPT-5.4	84
Claude Sonnet 4.6	81
GPT-5	80
GPT-4.1	70
Claude Haiku 4.5	64
GPT-5 Mini	65
GPT-4o	62
GPT-4o Mini	52

The top six are within 8 points — within typical run-to-run variance on most evals. Don't pick a model on intelligence score alone unless you've evaluated on your specific task; the cost/speed/feature differences will matter more in production.

Context window: Claude wins big

Model	Context window
Claude Sonnet 4.6	1,000K (1M)
Claude Opus 4.7	1,000K (1M)
GPT-5.5 (reasoning)	1,000K (1M)
GPT-5	400K (400K)
GPT-4.1	1,000K (1M)
Claude Haiku 4.5	200K (200K)
GPT-4o Mini	128K (128K)

For massive-document processing or long-running agent workflows, Claude Sonnet 4.6 at 1M context plus 90% caching discount is the most economical choice — even with the tokenizer overhead.

Caching: Anthropic deeper, OpenAI simpler

Provider	Cached read	Cache write	Mechanism
Anthropic	10% of input price (90% off)	+25% (5-min TTL) or +100% (1-hour TTL)	Explicit `cache_control` markers
OpenAI	50% of input price (50% off)	None (automatic, no surcharge)	Automatic on prefixes ≥1024 tokens

Break-even on Anthropic 5-min cache: 3 reads of the same prefix. If your traffic re-uses system prompts, tool definitions, or document context multiple times in a few minutes, Anthropic's caching wins decisively. If your traffic is bursty with varying prefixes, OpenAI's automatic 50% discount with no surcharge is the safer default.

Full caching mechanics: prompt caching deep dive →

Verdict by use case

Building a chatbot

Default to GPT-4.1 or Claude Sonnet 4.6. Both are strong balanced-tier options. GPT-4.1 is cheaper sticker (and actual, since GPT doesn't have the tokenizer overhead). Claude Sonnet 4.6 has 1M context, which matters if you stuff a lot of history. If latency matters: GPT-4o-mini or Haiku 4.5.

Building a coding agent

Current evals favor Claude Sonnet 4.6 and Opus 4.7 on agentic coding tasks (SWE-bench, terminal-bench). Anthropic's caching of long codebase context pays back fast. GPT-5 is closer than the headlines suggest but historically Claude wins coding benchmarks by 5–10 points.

RAG over large documents

Claude Sonnet 4.6. 1M context plus 90% caching discount on the document content beats GPT-5 (400K context, 50% cache discount) on cost-per-RAG-call after 3+ queries against the same document. For one-shot RAG with 200K context or less, GPT family wins on price.

High-volume cheap calls

GPT-5-nano or GPT-4o-mini. No Claude tier exists below Haiku 4.5's $1/$5 pricing. For workloads in the millions of calls per day, the GPT family is materially cheaper.

Hardest reasoning

GPT-5.5 (reasoning) or Claude Opus 4.7 with adaptive thinking enabled. Both top the intelligence rankings. Output costs are deceptive — both include hidden reasoning tokens that 4–10× the visible output bill. Test on your specific task; differences are smaller than benchmark scores suggest.

Reference

Anthropic API pricing
OpenAI API pricing
Artificial Analysis Intelligence Index
Full price table for all providers
Prompt caching deep dive
Hidden costs of LLM apps — including the tokenizer trap explored above

FAQ

Is Claude or GPT cheaper?

It depends on the tier you compare. Flagship-vs-flagship: GPT-5 ($1.25/$10) is materially cheaper than Claude Opus 4.7 ($5/$25), with Claude's 35% tokenizer overhead making the gap even wider. Balanced-tier: GPT-4.1 ($2/$8) edges out Claude Sonnet 4.6 ($3/$15). Fast-tier: Gemini Flash-Lite undercuts both, but among Claude/GPT only, GPT-4o-mini ($0.15/$0.60) is the cheapest mainstream option. The reasoning-model picture is closer.

Is Claude smarter than GPT?

On the public benchmark composite (Artificial Analysis Intelligence Index v4 + lmarena ELO), GPT-5.5 is currently #1 (88), Claude Opus 4.7 is #2 (86), with GPT-5.4 (84), Gemini 3.1 Pro (86), Claude Sonnet 4.6 (81), and GPT-5 (80) clustered next. Within 3-5 points, the differences are smaller than within-model variance — pick on cost, speed, or specific capability strengths instead.

Why does the same prompt cost more on Claude than the sticker price suggests?

Claude 4.x ships a new BPE tokenizer that consumes roughly 35% more tokens than older Claude models or GPT's o200k_base on the same English text. So a prompt OpenAI bills as 1,000 tokens, Claude bills as ~1,350. The per-token price is what you compare, but the actual API charge is computed on the tokenizer's token count.

Which is faster, Claude or GPT?

GPT family is faster on average. Average output throughput: GPT-4o-mini ~130 tok/s, GPT-5 ~100 tok/s; Claude Sonnet 4.6 ~45 tok/s, Claude Opus 4.7 ~49 tok/s. Claude Haiku 4.5 (~91 tok/s) is the closest to GPT speeds. For latency-sensitive UX (autocomplete, voice), GPT or a Groq-hosted Llama is typically the better pick.

Which has better caching?

Anthropic's caching gives a deeper discount (90% off cached reads vs OpenAI's automatic 50%), but you pay a 25% surcharge to write the cache. Break-even: 3+ reads of the same prefix. OpenAI's caching is automatic with no surcharge — first call full price, subsequent cache hits 50% off. For high-traffic apps with stable prefixes (system prompts, documents), Anthropic wins on absolute cost. For bursty or varying traffic, OpenAI wins on simplicity.

When would I pick Claude over GPT?

Three cases. (1) You need 1M-token context — Claude Sonnet 4.6 and Opus 4.7 both have it; GPT-5 caps at 400K. (2) You're doing heavy prompt-caching with stable prefixes — Anthropic's 90%-off reads beat OpenAI's 50%. (3) The specific evals on your tasks favor Claude (often code, complex multi-step reasoning, coding agents). Otherwise GPT is usually cheaper and faster.

When would I pick GPT over Claude?

Latency-sensitive apps (chat UIs, voice, autocomplete), price-sensitive workloads at scale (GPT-5-nano at $0.05/$0.40 has no Claude equivalent), and mature ecosystem features (DALL-E in same API, Whisper, fine-tuning, structured outputs with strict schema validation).

Compare more models in the calculator →