Why do AI token costs scale linearly?

Every user request that hits a language model consumes tokens — input tokens for the prompt and output tokens for the response. Unlike shared infrastructure (databases, caches), tokens are consumed per-request with no sharing. Doubling your users doubles your token cost, regardless of scale.

How much do AI tokens cost per month for a product?

Token costs depend on product complexity. Low complexity products (simple chatbots, Q&A) start around $500/month. Medium complexity (multi-step workflows, RAG) runs about $3,000/month. High complexity (agents, code generation, heavy reasoning) starts at $12,000/month. These baselines scale linearly with user count.

What is the difference between Pure AI and Hybrid token ratios?

Pure AI products have a token ratio of 1.0 — all functionality routes through the language model. Hybrid products have a token ratio of 0.40–0.44 — they use traditional infrastructure for deterministic tasks and AI only where it adds value. This 56–60% reduction in token dependence dramatically improves cost trajectory at scale.

How can I reduce AI token costs?

Key strategies include: (1) Response caching for repeated queries, (2) Prompt optimisation to reduce token count, (3) Model tiering — use cheaper models for simple tasks, (4) Moving deterministic logic to traditional code (Hybrid approach), (5) Batching requests where possible, (6) Implementing usage limits and throttling.

The Hidden Risk of Pure AI Products: Token Cost at Scale

I love building Pure AI products. They're addictive to ship. A small team with good prompts can put something in front of users in days that would have taken a SaaS team six months a couple of years ago. The model does the reasoning, the generation, the classification, the extraction, and you get to focus on orchestration and UX. It feels like a cheat code.

And then the cost question shows up. Almost always at the same moment in the journey: somewhere between "we just hit 5,000 users" and the first board meeting where someone asks about gross margin. Because here's the thing nobody puts on the pitch deck — token costs scale linearly. Every new user adds the same marginal cost as the last one. Forever. No discount for scale. And once user growth starts compounding with usage growth, the numbers get uncomfortable very fast. This is the conversation I want to have with you before that board meeting.

Tokens Scale Linearly (1:1 with Users)

This is the bit I find myself drawing on whiteboards over and over, so I'll draw it for you here. Traditional infrastructure follows a power law — it grows at roughly the 0.6 power of user count, which is the closest thing software has to economies of scale. Tokens don't get that gift. Every user request to a language model consumes input tokens (your prompt and context) and output tokens (the model's response), and those are consumed per request, with zero sharing between users.

Infrastructure: cost_multiplier = pow(user_ratio, 0.6) — sublinear, with economies of scale

Tokens: cost_multiplier = user_ratio — linear, no economies of scale

Translate that into real numbers: at 10x users, infrastructure costs roughly 4x more, but tokens cost exactly 10x more. At 200x users, infrastructure has grown 27x while tokens have grown 200x. The gap doesn't close as you grow — it widens with every user you onboard. That's the part I want clients to internalise before they sign anything.

Token Cost Baselines

So what do tokens actually cost in the wild? It depends on what your product is asking the model to do — how much reasoning per call, how big your context windows are, and how often each user triggers an AI request. These are the rough monthly baselines I anchor on when I'm doing back-of-the-envelope sizing for a client:

Complexity	Token Cost/mo (base)	Typical Use Cases
Low	$500	Simple chatbots, Q&A, text classification
Medium	$3,000	Multi-step workflows, RAG, document analysis
High	$12,000	Autonomous agents, code generation, heavy reasoning chains

These are baselines for your starting cohort. Everything above scales linearly from here — which is exactly the problem.

Infrastructure (0.6 Power) vs Tokens (1.0 Power)

Let me show you what this looks like across a real growth curve. Here's how a medium-complexity product's costs evolve as it scales from 500 to 100,000 users, splitting infrastructure and tokens out so you can see how differently they behave:

Users	User Ratio	Infra Multiplier (0.6)	Token Multiplier (1.0)	Gap
500	1x	1.0x	1.0x	0%
1,000	2x	1.52x	2.0x	24%
5,000	10x	3.98x	10.0x	60%
25,000	50x	10.46x	50.0x	79%
100,000	200x	27.30x	200.0x	86%

Read that last row again. 200x users. Infrastructure has grown 27x. Tokens have grown 200x. The "gap" column is the bit that keeps me up at night for clients on Pure AI — it shows how much more expensive tokens become relative to infrastructure at every scale point. That divergence is the hidden risk in one number.

Pure AI (1.0) vs Hybrid (0.40–0.44) Token Ratios

The single number I obsess over with founders is the token ratio — the fraction of their running cost that sits on the linear curve. Here's how the two architectures stack up:

	Pure AI	Hybrid
Token ratio	1.0	0.40 – 0.44
Infra ratio	0.09 – 0.13	0.44 – 0.50
Med. tokens/mo (base)	$3,000	$1,200 – $1,320
Med. tokens/mo at 10x users	$30,000	$12,000 – $13,200
Med. tokens/mo at 200x users	$600,000	$240,000 – $264,000

The risk in one paragraph

Sit with this for a second. A medium-complexity Pure AI product at 200x users is staring at $600,000/month in tokens alone. The Hybrid equivalent lands at $240,000–$264,000/month — and the other half of its cost base (infrastructure) gets the sublinear discount on top. Pure AI doesn't have enough infrastructure to even benefit from that discount. This is the slide that ends the meeting.

The Compounding Problem: User Growth × Usage Growth

Here's the part I almost always have to walk founders through twice, because it's not intuitive the first time. Token cost risk compounds, because two growth rates multiply together:

User growth — more people using your product, the obvious one.
Usage growth — each existing user making more AI calls as they discover features, build habits, or as you ship more AI-powered surface area.

If your users grow at 8% monthly and per-user token consumption grows at just 3% monthly, your effective token growth isn't 8% or 11% — it's roughly 11% compounding. Compounding is the word that does all the damage here.

Scenario	User Growth	Usage Growth	Effective Token Growth	36-Month Multiplier
Best case	3%	0%	3%	2.9x
Realistic	8%	3%	~11%	39x
Aggressive	15%	5%	~21%	1,020x

Read that aggressive row again. A product that started at $3,000/month in tokens is staring at over $3 million/month by month 36. Even the "realistic" scenario turns $3,000 into $117,000. I've seen these numbers personally sink AI-first startups whose founders absolutely did not model the trajectory. The good news is the spreadsheet to avoid this fate takes one afternoon.

How I'd Actually Fix This

The answer isn't to abandon AI — that ship has sailed and honestly you'd be giving up too much capability. The answer is to reduce your token ratio and slow the compounding. These are the levers I reach for with clients, ordered by how much they actually move the needle:

1. Move to a Hybrid Architecture

This is the one I almost always start with. Route the deterministic work — CRUD, dashboards, reporting, scheduling, anything that should be exactly the same every time — through traditional infrastructure. Reserve AI for the work that genuinely benefits from it. Doing this drops your token ratio from 1.0 to roughly 0.40–0.44, which more than halves your token cost trajectory in one move. Nothing else on this list comes close to that.

2. Implement Response Caching

You will be surprised — I am, every time — how many AI calls produce identical or near-identical responses for identical-ish inputs. Cache at the prompt level and you'll typically claw back 20–40% of your token consumption depending on how repetitive your traffic is. Add semantic caching (matching similar-enough queries) and you can push that further. This is the lowest-effort win on the list.

3. Optimise Your Prompts

Shorter prompts use fewer input tokens. More focused prompts generate shorter responses. I've watched teams cut 30–50% off their per-call token consumption from systematic prompt work, with no measurable drop in quality. The things I'd attack first:

Strip redundant context and instructions you copy-pasted in three iterations ago
Move to structured output formats (JSON) instead of free-text essays
Cap response length explicitly — models will happily ramble
Pre-process inputs to shrink the context window before it hits the model

4. Model Tiering

Not every AI call needs your most capable (and most expensive) model. Route simple classification and extraction to small, cheap models — reserve the large reasoning models for the stuff that genuinely needs them. A well-designed tiering layer can drop your average cost per call by 40–60%, and most teams set it up in a sprint.

5. Usage Limits and Throttling

Pin per-user rate limits to your pricing. Free tier gets a fixed token budget. Paid tiers get larger budgets tied to their plan. This doesn't lower your cost per call, but it caps the "usage growth" multiplier — which, as we just saw, is the variable doing the real compounding damage.

6. Batch Processing

Where you don't need real-time, batch. Most providers offer 50% off on batch APIs, and the reduced overhead from fewer calls adds incremental savings on top. The mental shift is "does this user actually need an answer in 800ms, or can we batch it overnight?" — the answer is usually the second one.

My 80/20 rule for this

Move to Hybrid, add basic caching, stop. That alone will typically cut your token cost trajectory by 60–70%. The rest of the list (prompt optimisation, tiering, throttling, batching) is incremental polish — worth doing eventually, but don't sequence those before the two big rocks. Start where the leverage is.

What I Want You to Take Away

Tokens scale linearly. Forever. 10x users = 10x token cost. There's no discount coming.
Infrastructure scales sublinearly. 10x users = ~4x infrastructure cost. The gap between these two curves widens at every scale point, and that gap is the risk.
Pure AI is structurally riskier than Hybrid. It's not a moral judgement — it's just that more of your cost base sits on the linear curve. Token ratio of 1.0 vs 0.40–0.44 is the whole game.
User growth times usage growth compounds. A "realistic" 11% monthly effective growth turns $3K/month into $117K/month in 36 months. That's not a worst-case — it's the middle column.
Start with Hybrid plus caching. If you do nothing else on the mitigation list, do those two. They cut the trajectory by 60–70% and you keep the AI capability.
Fix this before 10,000 users, not after. The cost of changing architecture at 1,000 users is a sprint. The cost at 100,000 users is a rebuild. I've watched both.

Run the Token Cost Trajectory on Your Own Numbers

I built the Cost Analyser to do exactly this projection — plug in your complexity tier and growth rate and you'll see the Pure AI vs Hybrid token trajectory play out over 36 months. Worth twenty minutes before your next architecture conversation.

Open the Cost Analyser →

AI Token Cost Risk Unit Economics Hybrid Architecture Strategy

Suhith Illesinghe

I run Mustard Seeds Group, where I help founders and product teams pick the right architecture for what they're actually trying to build — and then help them ship it. I write about the trade-offs I run into in client work, with the numbers I actually use.

Get in touch →