LLM token & cost calculator

Paste a prompt. See token counts and per-request cost across 43+ models, sorted by total. Adjust expected output and cached-input ratio to see how prompt caching changes the picture. Runs in your browser — your prompt is never sent to a server.

Load example:
How long the model's response will be. Reasoning models (o3, R1) hide most of their tokens — actual output may be 4-10× this.
%
Caching can drop input cost by 90% — adjust to see savings.
requests
Total cost across this many identical calls.
Show:
Optimize for:
Model Speed
tok/s
Intel Input
tokens
Input
cost
Output
cost
Per
request
× 1 Value

Prices in USD per request. Token counts use cl100k_base — directional accuracy is fine for budgeting. Speed is average output throughput; Groq / Cerebras hosting can exceed listed values significantly (we list those as separate models). Intelligence is a 0–100 composite from public benchmarks (Artificial Analysis Index v4, lmarena ELO, MMLU/GPQA) — directional. Value blends each dimension on a log scale, weighted by the active preset above the table. Each dimension is normalized to [0,1] on a fixed scale (intel: 0–100, speed: 30–1000 tok/s, cost: $0.04–$50/M blended) — so the score is absolute, not relative to current filters. See /pricing for the full price table.

What actually drives your LLM bill

For any sufficiently busy product, three things dominate cost. The calculator above shows you the first; the rest of this site explains the others:

  1. Per-token price × tokens. Obvious, but the difference between Claude Haiku 4.5 and Claude Opus 4.7 is 15× — for many tasks Haiku is good enough and the savings are real. Full price table →
  2. Prompt caching. The biggest lever you're probably ignoring. Re-sending the same system prompt or document context drops to ~10% of normal price after the first call (Anthropic), or 50% off automatically (OpenAI). Get this right and your bill can drop 60-80%. Caching guide →
  3. Output tokens are 3-5× input tokens. Forcing structured short outputs (JSON with strict schema, single-line summaries) saves real money. Reasoning models (o3, R1) hide additional output tokens you still pay for.

Topics

Coming soon

FAQ

How accurate are the token counts?

OpenAI models use the exact tiktoken o200k_base tokenizer — counts match the real API. Other providers (Claude, Gemini, Llama, etc.) ship proprietary tokenizers we approximate with cl100k_base; counts are typically within 5%. For Anthropic exact counts you can call client.messages.countTokens() in the Anthropic SDK.

Why are reasoning models (o3, R1) not 'cheaper' the way they look?

Their output cost includes hidden reasoning tokens you never see — typically 4-10× the visible response. The calculator shows the visible-output cost; multiply by ~5-8 for a realistic estimate of o3 or R1 actual output billing.

What's the difference between input cost and cached input?

When you re-send the same context (system prompt, document, tool definitions) within minutes/hours, providers charge ~10% of normal input price for the cached portion. For high-traffic apps, prompt caching is the biggest single cost lever — see the caching guide.

Why is Gemini's price doubling above 200K tokens?

Google charges a premium for prompts longer than 200K tokens — $2.50/M input and $10/M output instead of $1.25/$5. The calculator uses the under-200K tier; if your prompt is longer, double those numbers.

Are these prices current?

Each model has a last_verified date in /pricing. Provider pricing changes every few months — always confirm against the provider's pricing page before signing a long-term commitment.