Question 1

How accurate are the token counts?

Accepted Answer

OpenAI models use the exact tiktoken o200k_base tokenizer — counts match the real API. Other providers (Claude, Gemini, Llama, etc.) ship proprietary tokenizers we approximate with cl100k_base; counts are typically within 5%. For Anthropic exact counts you can call client.messages.countTokens() in the Anthropic SDK.

Question 2

Why are reasoning models (o3, R1) not 'cheaper' the way they look?

Accepted Answer

Their output cost includes hidden reasoning tokens you never see — typically 4-10× the visible response. The calculator shows the visible-output cost; multiply by ~5-8 for a realistic estimate of o3 or R1 actual output billing.

Question 3

What's the difference between input cost and cached input?

Accepted Answer

When you re-send the same context (system prompt, document, tool definitions) within minutes/hours, providers charge ~10% of normal input price for the cached portion. For high-traffic apps, prompt caching is the biggest single cost lever — see the caching guide.

Question 4

Why is Gemini's price doubling above 200K tokens?

Accepted Answer

Google charges a premium for prompts longer than 200K tokens — $2.50/M input and $10/M output instead of $1.25/$5. The calculator uses the under-200K tier; if your prompt is longer, double those numbers.

Question 5

Are these prices current?

Accepted Answer

Each model has a last_verified date in /pricing. Provider pricing changes every few months — always confirm against the provider's pricing page before signing a long-term commitment.

LLM token & cost calculator

What actually drives your LLM bill

Topics

Coming soon

FAQ