Token Pricing / News

LLM pricing changes — May 2026

The pricing landscape moved hard over the last 60 days. Two flagships repriced, one entire family rebranded, three new models launched, and one major price war started. Here's everything that changed and what it means for budgets.

See the full updated table →

The big ones

Claude Opus 4.7 launched at $5/$25 (down from Opus 4.6's $15/$75)

Anthropic's flagship dropped 3× on input and 3× on output with the 4.7 launch in late April 2026. Cached reads at $0.50/M (10% of input). Cache writes scale to $6.25/M (5-min TTL) or $10/M (1-hour TTL). 1M context window now standard across both Sonnet 4.6 and Opus 4.7.

Caveat: the new Claude 4.x tokenizer consumes about 35% more tokens than older Claudes for the same English text. Per-token prices look great; per-task bills are roughly 35% above the sticker. Net of tokenizer overhead, real cost reduction vs Opus 4.6 is around 55% rather than 67%. Still meaningful.

GPT-5 family launched at $1.25/$10 (flagship)

OpenAI's GPT-5 launch slotted in below GPT-4.1 ($2/$8) on input but above on output. The Mini / Nano variants undercut the previous lineup significantly:

GPT-5: $1.25/$10 — flagship, 400K context
GPT-5 Mini: $0.125/$1.00 — cheaper than GPT-4.1 Mini's $0.40/$1.60
GPT-5 Nano: $0.05/$0.40 — currently cheapest mainstream OpenAI model
GPT-5.4: $2.50/$15 — flagship-quality, ties Opus 4.7 on benchmarks
GPT-5.5 (reasoning): $5/$30 — top of the Artificial Analysis Index

OpenAI o3 cut from $15/$60 to $2/$8 (87% reduction)

The most dramatic single price cut of the period. o3 launched at flagship-reasoning prices in 2025 and was repriced to balanced-tier pricing as GPT-5.5 took over the reasoning crown. o3-mini shifted to $0.55/$2.20 from its previous $1.10/$4.40 (those prices are now o4-mini's).

DeepSeek rebranded V3/R1 to V4 Flash/Pro

DeepSeek's flagship line is now V4:

V4 Flash: $0.14/$0.28, 1M context — replaces V3 ($0.27/$1.10)
V4 Pro (reasoning): $1.74/$3.48, currently 75% promo through 2026-05-31 → effective $0.435/$0.87

V4 Flash now has automatic disk-based context caching at 10% of input price — no opt-in, no cache-write fee. For high-traffic apps this is competitive with Anthropic's caching at a fraction of the per-token cost.

Mistral Large 3 priced at $0.50/$1.50 (down from Large 2's $2/$6)

Mistral's flagship cut by 75% on input and 75% on output. Mistral Large 3 is currently the cheapest credible flagship-tier model from a major European provider — relevant for EU-data-residency deployments.

New models added

Model	Input/Output	Notable
Gemini 3.1 Pro Preview	$2 / $12	Ties Opus 4.7 on intelligence; tiered above 200K
Gemini 3 Flash Preview	$0.50 / $3	Newer Flash tier; 1M context
Llama 4 Scout (Groq)	$0.11 / $0.34	~600 tok/s, MoE architecture
Llama 4 Maverick (Together)	$0.27 / $0.85	1M context, larger MoE
Grok 4	$3 / $15	xAI flagship reasoning
Grok 4 Fast	$0.20 / $0.50	2M context — largest at this price
Llama 3.1 405B (Cerebras)	$6 / $12	~970 tok/s — 19× Together's speed
Qwen 3 Max	$1.04 / $4.16	AA Index 78; 256K context
Codestral	$0.30 / $0.90	256K context, code-specialized
Ministral 3B	$0.04 / $0.04	Cheapest Mistral; edge-viable

Models removed / superseded

DeepSeek V3 → DeepSeek V4 Flash (renamed). Old API IDs still work but are end-of-lifed.
DeepSeek R1 → DeepSeek V4 Pro (renamed). Same.
Mistral Large 2 → Mistral Large 3. The old $2/$6 pricing is gone.
Gemini 1.5 Pro / Flash — no longer listed on Google's main pricing page (still accessible via API for now).
Claude Haiku 3.5 — still listed at $0.80/$4 but very low traffic; will likely be removed by end of 2026.

What this means for budgets

Three takeaways for anyone running production LLM workloads:

Re-cost everything. If your last spend projection was based on prices from Q1 2026 or earlier, it's materially out of date. Run the numbers in the calculator with current rates.
Don't lock in long-term contracts on current flagship prices. The pattern (GPT-4 → GPT-4o → GPT-4.1 → GPT-5) shows flagship prices drop 50-80% over 12-18 months. Quarterly reviews are the right cadence.
Caching strategy matters more, not less. Even at the lower flagship prices, a 4K-token system prompt sent on every call adds up. DeepSeek V4 Flash's automatic caching is especially noteworthy — no work to set up.

Watch list for next month

Gemini 3 GA pricing — currently in preview; Google often cuts 20-30% at GA
Claude Haiku 5 — rumored launch in Q3 2026; historically Haiku launches at significant discount to predecessor
OpenAI volume tier expansion — possible introduction of pre-paid commitment discounts at $10K+/month
Groq Llama 4 hosting — currently $0.11/$0.34 for Scout; likely to drop further as competitive pressure builds
DeepSeek V4 Pro post-promo pricing — 75% discount ends 2026-05-31; the post-promo rate of $1.74/$3.48 will be either confirmed or revised

Sources

FAQ

Why did OpenAI cut o3 prices so dramatically ($15/$60 → $2/$8)?

Three drivers: (1) GPT-5 launched, making o3 a step behind on capability, justifying repositioning; (2) inference cost drops as new chips roll out; (3) competitive pressure from DeepSeek R1 (now V4 Pro) at a fraction of the price. The pattern is consistent: yesterday's flagship becomes today's mid-tier at 1/5 the price.

Did Anthropic actually cut Claude Opus prices?

Yes. Claude Opus 4.7 launched at $5/$25 per million input/output tokens — substantially below Opus 4.6's $15/$75. Anthropic appears to be repositioning Opus from a premium-only tier to a competitive flagship matching GPT-5.x and Gemini 2.5/3 Pro on price. Claude 4.x's new tokenizer eats ~35% more tokens than older Claudes, partially offsetting the cut.

Are GPT-5 prices final or will they drop?

OpenAI's typical pattern: launch at one price, cut 30-50% within 6-9 months. GPT-4 launched at $30/$60 in 2023 and is now $2.50/$10 (as GPT-4o) or $2/$8 (as GPT-4.1). Expect GPT-5 to drop similarly — probably to ~$0.80/$6 by end of 2026. Don't lock in long-term pricing on flagship models.

What's likely to change next?

Three watch items: (1) Gemini 3 family general availability (currently preview pricing) — Google often undercuts on launch; (2) Claude Haiku 5 (rumored) — historically Haiku launches priced 50-70% below the predecessor's first month; (3) inference-host wars — Groq, Cerebras, and SambaNova are likely to drop Llama hosting prices further as they compete for share.

Where can I see live prices?

The /pricing table on this site is updated as providers change rates — each model has a last_verified date. Provider pages: Anthropic at anthropic.com/pricing, OpenAI at openai.com/api/pricing, Google at ai.google.dev/pricing.

Re-cost your workload with current prices →