The big ones
Claude Opus 4.7 launched at $5/$25 (down from Opus 4.6's $15/$75)
Anthropic's flagship dropped 3× on input and 3× on output with the 4.7 launch in late April 2026. Cached reads at $0.50/M (10% of input). Cache writes scale to $6.25/M (5-min TTL) or $10/M (1-hour TTL). 1M context window now standard across both Sonnet 4.6 and Opus 4.7.
Caveat: the new Claude 4.x tokenizer consumes about 35% more tokens than older Claudes for the same English text. Per-token prices look great; per-task bills are roughly 35% above the sticker. Net of tokenizer overhead, real cost reduction vs Opus 4.6 is around 55% rather than 67%. Still meaningful.
GPT-5 family launched at $1.25/$10 (flagship)
OpenAI's GPT-5 launch slotted in below GPT-4.1 ($2/$8) on input but above on output. The Mini / Nano variants undercut the previous lineup significantly:
- GPT-5: $1.25/$10 — flagship, 400K context
- GPT-5 Mini: $0.125/$1.00 — cheaper than GPT-4.1 Mini's $0.40/$1.60
- GPT-5 Nano: $0.05/$0.40 — currently cheapest mainstream OpenAI model
- GPT-5.4: $2.50/$15 — flagship-quality, ties Opus 4.7 on benchmarks
- GPT-5.5 (reasoning): $5/$30 — top of the Artificial Analysis Index
OpenAI o3 cut from $15/$60 to $2/$8 (87% reduction)
The most dramatic single price cut of the period. o3 launched at flagship-reasoning prices in 2025 and was repriced to balanced-tier pricing as GPT-5.5 took over the reasoning crown. o3-mini shifted to $0.55/$2.20 from its previous $1.10/$4.40 (those prices are now o4-mini's).
DeepSeek rebranded V3/R1 to V4 Flash/Pro
DeepSeek's flagship line is now V4:
- V4 Flash: $0.14/$0.28, 1M context — replaces V3 ($0.27/$1.10)
- V4 Pro (reasoning): $1.74/$3.48, currently 75% promo through 2026-05-31 → effective $0.435/$0.87
V4 Flash now has automatic disk-based context caching at 10% of input price — no opt-in, no cache-write fee. For high-traffic apps this is competitive with Anthropic's caching at a fraction of the per-token cost.
Mistral Large 3 priced at $0.50/$1.50 (down from Large 2's $2/$6)
Mistral's flagship cut by 75% on input and 75% on output. Mistral Large 3 is currently the cheapest credible flagship-tier model from a major European provider — relevant for EU-data-residency deployments.
New models added
| Model | Input/Output | Notable |
|---|---|---|
| Gemini 3.1 Pro Preview | $2 / $12 | Ties Opus 4.7 on intelligence; tiered above 200K |
| Gemini 3 Flash Preview | $0.50 / $3 | Newer Flash tier; 1M context |
| Llama 4 Scout (Groq) | $0.11 / $0.34 | ~600 tok/s, MoE architecture |
| Llama 4 Maverick (Together) | $0.27 / $0.85 | 1M context, larger MoE |
| Grok 4 | $3 / $15 | xAI flagship reasoning |
| Grok 4 Fast | $0.20 / $0.50 | 2M context — largest at this price |
| Llama 3.1 405B (Cerebras) | $6 / $12 | ~970 tok/s — 19× Together's speed |
| Qwen 3 Max | $1.04 / $4.16 | AA Index 78; 256K context |
| Codestral | $0.30 / $0.90 | 256K context, code-specialized |
| Ministral 3B | $0.04 / $0.04 | Cheapest Mistral; edge-viable |
Models removed / superseded
- DeepSeek V3 → DeepSeek V4 Flash (renamed). Old API IDs still work but are end-of-lifed.
- DeepSeek R1 → DeepSeek V4 Pro (renamed). Same.
- Mistral Large 2 → Mistral Large 3. The old $2/$6 pricing is gone.
- Gemini 1.5 Pro / Flash — no longer listed on Google's main pricing page (still accessible via API for now).
- Claude Haiku 3.5 — still listed at $0.80/$4 but very low traffic; will likely be removed by end of 2026.
What this means for budgets
Three takeaways for anyone running production LLM workloads:
- Re-cost everything. If your last spend projection was based on prices from Q1 2026 or earlier, it's materially out of date. Run the numbers in the calculator with current rates.
- Don't lock in long-term contracts on current flagship prices. The pattern (GPT-4 → GPT-4o → GPT-4.1 → GPT-5) shows flagship prices drop 50-80% over 12-18 months. Quarterly reviews are the right cadence.
- Caching strategy matters more, not less. Even at the lower flagship prices, a 4K-token system prompt sent on every call adds up. DeepSeek V4 Flash's automatic caching is especially noteworthy — no work to set up.
Watch list for next month
- Gemini 3 GA pricing — currently in preview; Google often cuts 20-30% at GA
- Claude Haiku 5 — rumored launch in Q3 2026; historically Haiku launches at significant discount to predecessor
- OpenAI volume tier expansion — possible introduction of pre-paid commitment discounts at $10K+/month
- Groq Llama 4 hosting — currently $0.11/$0.34 for Scout; likely to drop further as competitive pressure builds
- DeepSeek V4 Pro post-promo pricing — 75% discount ends 2026-05-31; the post-promo rate of $1.74/$3.48 will be either confirmed or revised
Sources
- Anthropic API pricing
- OpenAI API pricing
- Google Gemini pricing
- DeepSeek pricing
- Groq pricing
- Cerebras pricing
- Artificial Analysis Intelligence Index
- Full pricing table on this site
FAQ
Why did OpenAI cut o3 prices so dramatically ($15/$60 → $2/$8)?
Three drivers: (1) GPT-5 launched, making o3 a step behind on capability, justifying repositioning; (2) inference cost drops as new chips roll out; (3) competitive pressure from DeepSeek R1 (now V4 Pro) at a fraction of the price. The pattern is consistent: yesterday's flagship becomes today's mid-tier at 1/5 the price.
Did Anthropic actually cut Claude Opus prices?
Yes. Claude Opus 4.7 launched at $5/$25 per million input/output tokens — substantially below Opus 4.6's $15/$75. Anthropic appears to be repositioning Opus from a premium-only tier to a competitive flagship matching GPT-5.x and Gemini 2.5/3 Pro on price. Claude 4.x's new tokenizer eats ~35% more tokens than older Claudes, partially offsetting the cut.
Are GPT-5 prices final or will they drop?
OpenAI's typical pattern: launch at one price, cut 30-50% within 6-9 months. GPT-4 launched at $30/$60 in 2023 and is now $2.50/$10 (as GPT-4o) or $2/$8 (as GPT-4.1). Expect GPT-5 to drop similarly — probably to ~$0.80/$6 by end of 2026. Don't lock in long-term pricing on flagship models.
What's likely to change next?
Three watch items: (1) Gemini 3 family general availability (currently preview pricing) — Google often undercuts on launch; (2) Claude Haiku 5 (rumored) — historically Haiku launches priced 50-70% below the predecessor's first month; (3) inference-host wars — Groq, Cerebras, and SambaNova are likely to drop Llama hosting prices further as they compete for share.
Where can I see live prices?
The /pricing table on this site is updated as providers change rates — each model has a last_verified date. Provider pages: Anthropic at anthropic.com/pricing, OpenAI at openai.com/api/pricing, Google at ai.google.dev/pricing.