Quick verdict
| Use case | Recommended | Why |
|---|---|---|
| General chat / agent | Claude Sonnet 4.6 or GPT-4.1 | Both balanced-tier. GPT-4.1 cheaper ($2/$8 vs $3/$15). Claude better at coding agents in current evals. |
| Hardest reasoning task | GPT-5.5 (reasoning) or Claude Opus 4.7 | GPT-5.5 tops AA Index; Opus close behind, with 1M context as differentiator. |
| Cheap high-volume calls | GPT-5-nano ($0.05/$0.40) | No Claude equivalent at this price tier. Haiku 4.5 is 5–10× more expensive. |
| Latency-sensitive UX | GPT-4o-mini or Claude Haiku 4.5 | GPT-4o-mini ~130 tok/s, Haiku 4.5 ~91 tok/s. Both viable; GPT typically faster. |
| Long context (> 400K tokens) | Claude Sonnet 4.6 (1M) | GPT-5 caps at 400K; Claude Sonnet 4.6 and Opus 4.7 are 1M. |
| Heavy prompt caching | Claude family | 90% off cached reads (vs 50% on OpenAI). Worth the 25% write surcharge after 3 reads. |
Headline prices
Per-million-token pricing (input/output), before tokenizer adjustment:
| Tier | Anthropic | OpenAI |
|---|---|---|
| Reasoning (top) | — | GPT-5.5 (reasoning): $5/$30 |
| Flagship | Claude Opus 4.7: $5/$25 | GPT-5: $1.25/$10 |
| Balanced | Claude Sonnet 4.6: $3/$15 | GPT-4.1: $2/$8 |
| Fast | Claude Haiku 4.5: $1/$5 | GPT-4o Mini: $0.15/$0.6 |
| Tiny | — | GPT-5 Nano: $0.05/$0.4 |
The tokenizer trap (why Claude is more expensive than it looks)
Claude 4.x ships a new BPE tokenizer. The same English text turns
into roughly 35% more tokens than OpenAI's
o200k_base tokenizer. The per-token sticker prices
compare directly, but the per-task bill doesn't.
Worked example for a typical chat turn (1K prompt, 500 output):
| Model | Tokens billed | Sticker cost | Real cost (tokenizer-adjusted) |
|---|---|---|---|
| GPT-5 | 1,000 in / 500 out | $0.0063 | $0.0063 |
| Claude Opus 4.7 | 1,350 in / 675 out | $0.0175 | $0.0236 |
| Claude Sonnet 4.6 | 1,350 in / 675 out | $0.0105 | $0.0142 |
| GPT-4.1 | 1,000 in / 500 out | $0.0060 | $0.0060 |
At flagship-vs-flagship, GPT-5 costs $0.0063 per turn vs Claude Opus 4.7 at $0.0236 — a 4× cost gap, almost double the gap the sticker prices alone suggest.
Speed: GPT family is faster on average
| Model | Tok/s (avg, public telemetry) |
|---|---|
| GPT-4o Mini | 130 |
| GPT-4.1 Mini | 110 |
| GPT-5 | 100 |
| GPT-4o | 95 |
| GPT-4.1 | 90 |
| Claude Haiku 4.5 | 91 |
| GPT-5.5 (reasoning) (reasoning) | 68 |
| Claude Opus 4.7 | 49 |
| Claude Sonnet 4.6 | 45 |
For latency-sensitive UX (chat-as-you-type, voice agents, autocomplete) the GPT family is generally a better pick. Claude's deliberate output style is great for thoughtful long-form responses but visibly slower in interactive contexts.
Intelligence: closer than the marketing suggests
Composite of public benchmarks (Artificial Analysis Index v4, lmarena ELO, MMLU/GPQA), rescaled 0–100:
| Model | Intelligence |
|---|---|
| GPT-5.5 (reasoning) | 88 |
| Claude Opus 4.7 | 86 |
| GPT-5.4 | 84 |
| Claude Sonnet 4.6 | 81 |
| GPT-5 | 80 |
| GPT-4.1 | 70 |
| Claude Haiku 4.5 | 64 |
| GPT-5 Mini | 65 |
| GPT-4o | 62 |
| GPT-4o Mini | 52 |
The top six are within 8 points — within typical run-to-run variance on most evals. Don't pick a model on intelligence score alone unless you've evaluated on your specific task; the cost/speed/feature differences will matter more in production.
Context window: Claude wins big
| Model | Context window |
|---|---|
| Claude Sonnet 4.6 | 1,000K (1M) |
| Claude Opus 4.7 | 1,000K (1M) |
| GPT-5.5 (reasoning) | 1,000K (1M) |
| GPT-5 | 400K (400K) |
| GPT-4.1 | 1,000K (1M) |
| Claude Haiku 4.5 | 200K (200K) |
| GPT-4o Mini | 128K (128K) |
For massive-document processing or long-running agent workflows, Claude Sonnet 4.6 at 1M context plus 90% caching discount is the most economical choice — even with the tokenizer overhead.
Caching: Anthropic deeper, OpenAI simpler
| Provider | Cached read | Cache write | Mechanism |
|---|---|---|---|
| Anthropic | 10% of input price (90% off) | +25% (5-min TTL) or +100% (1-hour TTL) | Explicit cache_control markers |
| OpenAI | 50% of input price (50% off) | None (automatic, no surcharge) | Automatic on prefixes ≥1024 tokens |
Break-even on Anthropic 5-min cache: 3 reads of the same prefix. If your traffic re-uses system prompts, tool definitions, or document context multiple times in a few minutes, Anthropic's caching wins decisively. If your traffic is bursty with varying prefixes, OpenAI's automatic 50% discount with no surcharge is the safer default.
Full caching mechanics: prompt caching deep dive →
Verdict by use case
Building a chatbot
Default to GPT-4.1 or Claude Sonnet 4.6. Both are strong balanced-tier options. GPT-4.1 is cheaper sticker (and actual, since GPT doesn't have the tokenizer overhead). Claude Sonnet 4.6 has 1M context, which matters if you stuff a lot of history. If latency matters: GPT-4o-mini or Haiku 4.5.
Building a coding agent
Current evals favor Claude Sonnet 4.6 and Opus 4.7 on agentic coding tasks (SWE-bench, terminal-bench). Anthropic's caching of long codebase context pays back fast. GPT-5 is closer than the headlines suggest but historically Claude wins coding benchmarks by 5–10 points.
RAG over large documents
Claude Sonnet 4.6. 1M context plus 90% caching discount on the document content beats GPT-5 (400K context, 50% cache discount) on cost-per-RAG-call after 3+ queries against the same document. For one-shot RAG with 200K context or less, GPT family wins on price.
High-volume cheap calls
GPT-5-nano or GPT-4o-mini. No Claude tier exists below Haiku 4.5's $1/$5 pricing. For workloads in the millions of calls per day, the GPT family is materially cheaper.
Hardest reasoning
GPT-5.5 (reasoning) or Claude Opus 4.7 with adaptive thinking enabled. Both top the intelligence rankings. Output costs are deceptive — both include hidden reasoning tokens that 4–10× the visible output bill. Test on your specific task; differences are smaller than benchmark scores suggest.
Reference
- Anthropic API pricing
- OpenAI API pricing
- Artificial Analysis Intelligence Index
- Full price table for all providers
- Prompt caching deep dive
- Hidden costs of LLM apps — including the tokenizer trap explored above
FAQ
Is Claude or GPT cheaper?
It depends on the tier you compare. Flagship-vs-flagship: GPT-5 ($1.25/$10) is materially cheaper than Claude Opus 4.7 ($5/$25), with Claude's 35% tokenizer overhead making the gap even wider. Balanced-tier: GPT-4.1 ($2/$8) edges out Claude Sonnet 4.6 ($3/$15). Fast-tier: Gemini Flash-Lite undercuts both, but among Claude/GPT only, GPT-4o-mini ($0.15/$0.60) is the cheapest mainstream option. The reasoning-model picture is closer.
Is Claude smarter than GPT?
On the public benchmark composite (Artificial Analysis Intelligence Index v4 + lmarena ELO), GPT-5.5 is currently #1 (88), Claude Opus 4.7 is #2 (86), with GPT-5.4 (84), Gemini 3.1 Pro (86), Claude Sonnet 4.6 (81), and GPT-5 (80) clustered next. Within 3-5 points, the differences are smaller than within-model variance — pick on cost, speed, or specific capability strengths instead.
Why does the same prompt cost more on Claude than the sticker price suggests?
Claude 4.x ships a new BPE tokenizer that consumes roughly 35% more tokens than older Claude models or GPT's o200k_base on the same English text. So a prompt OpenAI bills as 1,000 tokens, Claude bills as ~1,350. The per-token price is what you compare, but the actual API charge is computed on the tokenizer's token count.
Which is faster, Claude or GPT?
GPT family is faster on average. Average output throughput: GPT-4o-mini ~130 tok/s, GPT-5 ~100 tok/s; Claude Sonnet 4.6 ~45 tok/s, Claude Opus 4.7 ~49 tok/s. Claude Haiku 4.5 (~91 tok/s) is the closest to GPT speeds. For latency-sensitive UX (autocomplete, voice), GPT or a Groq-hosted Llama is typically the better pick.
Which has better caching?
Anthropic's caching gives a deeper discount (90% off cached reads vs OpenAI's automatic 50%), but you pay a 25% surcharge to write the cache. Break-even: 3+ reads of the same prefix. OpenAI's caching is automatic with no surcharge — first call full price, subsequent cache hits 50% off. For high-traffic apps with stable prefixes (system prompts, documents), Anthropic wins on absolute cost. For bursty or varying traffic, OpenAI wins on simplicity.
When would I pick Claude over GPT?
Three cases. (1) You need 1M-token context — Claude Sonnet 4.6 and Opus 4.7 both have it; GPT-5 caps at 400K. (2) You're doing heavy prompt-caching with stable prefixes — Anthropic's 90%-off reads beat OpenAI's 50%. (3) The specific evals on your tasks favor Claude (often code, complex multi-step reasoning, coding agents). Otherwise GPT is usually cheaper and faster.
When would I pick GPT over Claude?
Latency-sensitive apps (chat UIs, voice, autocomplete), price-sensitive workloads at scale (GPT-5-nano at $0.05/$0.40 has no Claude equivalent), and mature ecosystem features (DALL-E in same API, Whisper, fine-tuning, structured outputs with strict schema validation).