Token Pricing / Use cases

Cheapest LLM for chatbots

Per-user-per-month cost for a real chatbot workload, across every major model. Includes the caching effect (which is huge for chatbots) and tokenizer overhead. Updated April 2026.

Adjust the assumptions in the calculator →

The workload we're costing

Calculations below assume a typical chatbot use case based on public production data (Discord bots, customer-support copilots, consumer chatbots):

Variable	Value
Messages per active user per day	5
System prompt size	1,500 tokens
Conversation history (avg, in window)	1,000 tokens
User input per message	80 tokens
Response per message	300 tokens
Days/month per active user	30

Adjust these in the calculator for your specific workload — most apps' values are within 30% of these defaults.

Cost ranking, with and without caching

Sorted by per-user-per-month cost with system prompt cached (which is what you should actually do):

Model	Uncached	Cached	10K users/mo
Ministral 3B	$0.0173	$0.0173	$172.80
GPT-4.1 Nano	$0.0283	$0.0227	$227.25
Llama 3.1 8B Instant (Groq)	$0.0229	$0.0229	$229.50
GPT-5 Nano	$0.0373	$0.0272	$272.25
Gemini 2.5 Flash-Lite	$0.0567	$0.0398	$398.25
DeepSeek V4 Flash	$0.0668	$0.0416	$415.80
Mistral Small 3	$0.0522	$0.0522	$522.00
GPT-4o Mini	$0.0851	$0.0682	$681.75
GPT-5 Mini	$0.0934	$0.0709	$708.75
Gemini 2.5 Flash	$0.229	$0.178	$1780
Llama 3.3 70B (Groq)	$0.264	$0.264	$2639
Claude Haiku 4.5	$0.826	$0.553	$5528

Numbers in cents/dollars per user per month. The "10K users/mo" column is what you'd pay total for a chatbot serving 10,000 active users monthly at this workload.

Verdict by scale

Solo project / under 1,000 users

Pick on capability, not cost — the bill is too small to matter. At 1,000 users/month a Claude Sonnet 4.6 chatbot with caching runs ~$1658.48/month total. Use the model your evals say is best.

1,000–10,000 users

Cost starts to matter, but quality still matters more. Sweet spot: Gemini 2.5 Flash-Lite, GPT-5 Nano, or Claude Haiku 4.5. All three handle a routine chatbot well, all three are caching-friendly, all three keep monthly bills well under $1,000.

10,000–100,000 users

Cost dominates. Optimize hard: cache aggressively, cap output length, route obvious queries to small models, escalate only complex queries to the bigger ones. Top picks: GPT-5 Nano ($2723/mo for 100K users) or Gemini Flash-Lite.

100,000+ users

At this scale every cost decision compounds. Consider:

Multi-model routing — cheap model for FAQ-style queries, escalate to GPT-5 Mini / Claude Sonnet for complex ones
Volume-tier negotiation — Anthropic, OpenAI, and Google all do enterprise discounts above ~$10K/month spend
Self-hosted Llama on dedicated GPUs becomes economically viable above ~$30K/month API spend
Inference-side optimization — drop conversation history older than 10 turns, use smaller responses, structured outputs only when needed

The caching effect (why "cheapest" depends on what you cache)

For chatbots specifically, caching is the single biggest cost lever. The 1.5K-token system prompt is identical for every conversation, which means it's the perfect caching target. Anthropic gives 90% off cached reads; OpenAI gives 50% automatically.

Without caching, Claude Haiku 4.5 with the workload above costs about $0.826/user/month. With Anthropic 5-min caching: $0.553/user/month — a 33% cost reduction. Caching deep dive →

Common mistakes that blow up chatbot costs

Sending entire conversation history every turn unbounded. After 50 messages, you're sending a 10K-token prompt on every reply. Set a sliding-window limit (last 10 messages, or last 5K tokens of history).
Not caching the system prompt. See above — single biggest cost lever.
Using a flagship model for FAQ-style queries. 90% of customer-support chatbot queries are routine. Route them to a small model and escalate the 10% that need it.
No output length cap. Without max_tokens, a verbose model can write 4,000 tokens when you wanted 200. Set the cap.
No daily/monthly per-user quota. A handful of users can run up your bill. Cap at e.g. 100 messages/day per user.

Ranking summary (cached cost, 5 messages/day workload)

The clear winners for chatbot use cases at scale:

Ministral 3B — $0.0173/user/month
GPT-4.1 Nano — $0.0227/user/month
Llama 3.1 8B Instant (Groq) — $0.0229/user/month
GPT-5 Nano — $0.0272/user/month
Gemini 2.5 Flash-Lite — $0.0398/user/month

Don't pick on price alone — run a quality eval on your specific chatbot's expected conversations before committing. Even a 10% quality drop in a customer-support context costs more in escalations than it saves in API fees.

FAQ

What's the absolute cheapest LLM for a chatbot in 2026?

Per-user-per-month for a typical chatbot workload (5 messages/day, system prompt cached): GPT-5 Nano and GPT-4.1 Nano are tied at the bottom of the OpenAI family. Mistral's Ministral 3B undercuts them on raw token cost but lacks caching. For most apps, Gemini 2.5 Flash-Lite or DeepSeek V4 Flash hit the sweet spot of cheap + capable + cached.

How much does ChatGPT cost per user per month if I built it?

Depending on model: roughly $0.05–$5/user/month for active users sending ~5 messages/day. Free OpenAI ChatGPT users likely cost OpenAI ~$0.20–$1.00/month (heavily subsidized by Plus subscribers). Self-built with GPT-5 Nano + caching: ~$0.05/user. Self-built with Claude Sonnet 4.6 + caching: ~$1.50/user.

Why does cached pricing matter so much for chatbots?

Chatbots have very stable system prompts (the personality, behavior rules, tool definitions don't change between users). With caching, that 1.5K-token system prompt costs 10% of normal price after the first call. Multiply across millions of conversations and it's the difference between profitable and unprofitable.

Can I run a chatbot on Llama 3.1 8B for cheaper?

Yes, on Groq it's ~$0.05/M input and $0.08/M output — among the cheapest options. Quality is meaningfully lower than GPT-5 Mini or Gemini Flash, but for narrow chatbots (FAQ answering, simple lookup), the gap may be acceptable. Always run a quality eval on your actual conversations before deciding on cost grounds.

How do I budget for unexpected user behavior?

Three buffers: (1) cap conversation length — drop oldest messages once context exceeds 5K tokens; (2) hard token limit on output (max_tokens=500 for chat); (3) per-user daily quota — most users send 0-5 messages/day; outliers send 200+. A 100-message/day cap protects you from the long tail.

Run your own chatbot numbers →