How long are real system prompts?
Most teams underestimate how big production system prompts get. Snapshot of publicly leaked / disclosed system prompts as of April 2026:
| Product | System prompt tokens |
|---|---|
| ChatGPT (GPT-4o, with all tools) | 4,523 |
| Claude.ai (Claude Opus 4.x) | 24,134 |
| Cursor IDE (default agent) | 9,847 |
| v0 by Vercel | 6,201 |
| Bolt.new | 3,475 |
Tokens are measured in cl100k_base approximation. For
Claude 4.x the actual API count is ~35% higher (different
tokenizer). The 24K-token Claude.ai system prompt becomes ~32K on
the wire.
Per-call cost: ChatGPT-style 4.5K-token system prompt
Most agentic apps end up around 4–6K tokens once you've added a meaningful set of tools. Cost per request (system prompt only, every turn, no caching):
| Model | Cost per call | 100K calls/month |
|---|---|---|
| GPT-4o Mini | $0.00068 | $67.84 |
| GPT-4.1 | $0.0090 | $904.60 |
| GPT-5 | $0.0057 | $565.38 |
| Claude Haiku 4.5 | $0.0061 | $610.61 |
| Claude Sonnet 4.6 | $0.0183 | $1832 |
| Claude Opus 4.7 | $0.0305 | $3053 |
Claude Sonnet 4.6 with this system prompt at 100K calls/month: $1832/month in system-prompt overhead alone. That's before any user input, output tokens, tool definitions beyond the system prompt, retries, or anything else.
Per-call cost: Claude.ai-style 24K-token system prompt
At the high end (Claude.ai's full leaked prompt at 24K tokens, billed as ~32.5K via Anthropic's tokenizer):
| Model | Cost per call | 100K calls/month |
|---|---|---|
| GPT-4o Mini | $0.0036 | $362.01 |
| GPT-5 | $0.0302 | $3017 |
| Claude Sonnet 4.6 | $0.0977 | $9774 |
| Claude Opus 4.7 | $0.163 | $16290 |
At Claude Opus 4.7 with the full Claude.ai system prompt: $16290/month in system-prompt overhead alone for 100K calls. This is why Anthropic's caching exists.
The caching difference
Cache the system prompt and the math changes dramatically. With Anthropic's 5-minute cache (cached reads at 10% of base price):
| Scenario | Per call | 100K calls/month |
|---|---|---|
| Sonnet 4.6, 4.5K prompt, no caching | $0.0183 | $1832 |
| Sonnet 4.6, 4.5K prompt, cached (5m) | $0.0018 | $183.18 |
| Savings | $1649/month saved | |
| Sonnet 4.6, 24K prompt, no caching | $0.0977 | $9774 |
| Sonnet 4.6, 24K prompt, cached (5m) | $0.0098 | $977.43 |
| Savings | $8797/month saved |
For a Claude.ai-scale system prompt at 100K calls/month, caching saves ~$880/month per user-bound session. Real Claude.ai is doing many millions of calls per day; their savings from caching are in the millions per month.
Where the tokens go in a real system prompt
Breakdown of a typical 5K-token agentic system prompt:
| Section | Tokens | % of total |
|---|---|---|
| Identity / role / persona | ~150 | 3% |
| Behavior rules / safety guidelines | ~600 | 12% |
| Tool definitions (schemas + descriptions) | ~2,800 | 56% |
| Few-shot examples | ~1,000 | 20% |
| Output formatting rules | ~400 | 8% |
| Date / context injection | ~50 | 1% |
Tools dominate. A 10-tool agent setup easily adds 3–5K tokens just describing what each tool does and what parameters it accepts. This is also why removing unused tools is one of the higher-leverage optimizations.
Three changes that drop system-prompt cost ~90%
- Cache it. Anthropic 5-minute cache: 90% off on all reads after the first. OpenAI: automatic 50% off. This alone is the difference between a $135/month system-prompt bill and a $13/month one.
- Move dynamic content out. Date stamps, request IDs, user-specific context — all of these invalidate the cache if they're in the system prompt. Move them into the user message instead.
- Drop unused tools. Each tool definition is 50–500 tokens. If you have 15 tools but most calls only use 3, consider routing to a smaller tool subset based on the user's intent before invoking the model.
Real leaked system prompts to study
These are publicly accessible via the jujumilk3/leaked-system-prompts and asgeirtj/system_prompts_leaks GitHub repositories — useful for understanding what production systems actually look like:
- ChatGPT (OpenAI) — 4.5K tokens; tool definitions for DALL-E, code interpreter, web browsing
- Claude.ai (Anthropic) — 24K tokens; very detailed behavior + tool descriptions
- Cursor — 9.8K tokens; code-editing agent with file/terminal/search tools
- v0 (Vercel) — 6.2K tokens; React component generation with strict output formatting
- Bolt.new — 3.5K tokens; lightest of the major coding agents
- Perplexity, Microsoft Copilot, Devin, GitHub Copilot — all leaked, all worth reading for prompt-engineering reference
Reading these is the fastest education in production prompt engineering. The patterns repeat across products — once you've seen 3 of them you've seen the template.
FAQ
How is a system prompt billed?
Same as user input — every token in the system prompt is counted as input tokens on every API call. There's no special 'system prompt' price; the system role just tells the model how to behave. For an agent making 100 calls per session, a 5,000-token system prompt costs you 500,000 input tokens of overhead.
Does caching the system prompt help?
Massively. On Anthropic, caching a 5,000-token system prompt drops the cost from $0.015/call to $0.0015/call after the first call (90% off cached reads, 5-min TTL). On OpenAI, automatic prefix caching gives 50% off after the first call within ~5-10 minutes. The system prompt is the single best caching target in any production app — see /prompt-caching.
Why are real system prompts so long?
Three reasons: (1) tool definitions — every tool the model can call must be described in the prompt, with name, description, and parameter schema; (2) safety / behavior rules accumulate over time as edge cases emerge; (3) examples — many production prompts include 5-20 few-shot examples to constrain output style. Cursor, v0, and Bolt's prompts are mostly tool definitions and behavior rules.
Can I just trim the system prompt to save money?
Sometimes. Common wins: remove unused tools (each saves 50-200 tokens), consolidate behavior rules, drop few-shot examples that the model handles fine without. But before trimming, set up caching first — it's typically a 90% cost reduction, while trimming is usually 10-20%. Cache first, optimize second.
Is the leaked Claude.ai system prompt really 24K tokens?
Yes. The April 2025 leak shows Claude Opus 4's full Claude.ai system prompt at ~24,000 tokens — a substantial portion of which is detailed tool descriptions for web search, computer use, code execution, and artifacts. Anthropic confirmed the leak's authenticity in a Mar 2025 blog post acknowledging public visibility into system prompts.