The conversations I'm having with clients now are shifting. The question used to be whether you should build AI systems. That argument is over. The question on the table today is what do those AI systems actually cost? — and that's a very different conversation.

Most clients walk into the room with the same assumption: AI systems are cheap, and they'll be cheap forever. It's an easy assumption to make. Tokens feel like rounding-error money when you're prototyping. But there's always a catch, and the catch shows up the moment your user count starts compounding. That's what this article is about — the catch, and how I help clients see it before it lands on their invoice.

The Three Architectures

Traditional SaaS

This is the one I grew up on. Everything runs on your own infrastructure — servers, databases, caches, queues. No token costs, no model dependencies, no surprise invoices when OpenAI changes their pricing. You pay for compute and storage, and those costs scale sublinearly (I'll come back to why that matters). It's the most mature pattern and, frankly, the easiest to sleep at night with.

Pure AI

The seductive one. The product routes most or all functionality through language models — often just an API gateway, auth layer and a thin database around them. The model provider does the heavy lifting, which is why I've seen teams ship a working v1 in weeks instead of months. The catch is that every user request burns tokens, and that bill doesn't get cheaper as you grow.

Hybrid

This is where most of my client work lands, and I don't think that's a coincidence. Use traditional infrastructure for the deterministic stuff (CRUD, dashboards, scheduling, anything you want to be exactly the same every time) and reach for AI only where it genuinely earns its keep — natural language interfaces, recommendations, content generation, classification. It's more work to design well, but it's the architecture I keep recommending because the numbers keep agreeing with me.

Build Cost Comparison

The first question I always get asked is "what does it cost to get to v1?" The honest answer is "it depends on complexity," but the relative ordering doesn't move. Pure AI is the cheapest because you're offloading logic to the model. SaaS is the most expensive because you're writing all of that logic yourself. Hybrid sits in the middle. These are the rough numbers I work with when I'm scoping an engagement:

Complexity SaaS Pure AI Hybrid
Low $40,000 $18,000 $28,000
Medium $200,000 $100,000 $145,000
High $650,000 $338,000 $481,000
Why is Pure AI cheaper to build?

Because the model handles the hard parts. Instead of writing the business logic, validation rules and data pipelines yourself, you write prompts and orchestration. The way I phrase it to founders: with Pure AI you're renting capability (tokens, forever) instead of owning it (code, paid for once). Cheaper today, more expensive every day after.

Running Cost Breakdown

Build cost is a one-time hit. Running cost is the line item that compounds, and in my experience it's the one founders consistently underestimate. There are two components, and they behave wildly differently as you grow.

Infrastructure: Sublinear Scaling (0.6 Power)

This is the bit I love about traditional infrastructure. Costs follow a power law — they grow at the 0.6 power of your user ratio, with a 40% floor. In plain English: doubling your users increases infrastructure cost by about 52%, not 100%. Shared resources like databases, caches and CDNs serve many users at once, so every new user is slightly cheaper to serve than the last. That's the closest thing software has to economies of scale.

Complexity SaaS Infra/mo AI Infra/mo Hybrid Infra/mo
Low $600 $60 – $78 $264 – $300
Medium $4,000 $360 – $520 $1,760 – $2,000
High $16,000 $1,440 – $2,080 $7,040 – $8,000

Tokens: Linear Scaling (1:1)

Tokens are the opposite story, and this is the one I find myself drawing on whiteboards over and over. Token costs scale linearly. No power law. No floor. Every new user adds the same marginal token cost as the last one, forever. If that doesn't sound scary yet, look at what 36 months of compound growth does to a linear cost line further down. This is the single number that has caused the most "oh no" moments in my client meetings.

Complexity SaaS Tokens/mo AI Tokens/mo Hybrid Tokens/mo
Low $0 $500 $200 – $220
Medium $0 $3,000 $1,200 – $1,320
High $0 $12,000 $4,800 – $5,280

Break-Even Analysis

When I sit down with a founder, the break-even question I really want them to ask isn't "when do we recoup the build cost?" — it's "when does the cheaper-to-build option become the more expensive thing to run?" Because Pure AI's token costs are linear, there is always a crossover point where the cheap build advantage gets eaten by the expensive running cost. The only question is whether you hit it in year one or year five.

The pattern I see in every modelling session:

At conservative growth (3% monthly), the crossover takes a while — Pure AI can genuinely look like the right call for 2–3 years. At aggressive growth (15% monthly), it shows up fast — often inside 12–18 months. My rule of thumb: the more confident you are about growth, the less you want to be on Pure AI.

Growth Rate Impact

Growth Rate Monthly Users at 36 Months (from 500) Token Cost Multiplier
Conservative 3% ~1,450 2.9x
Realistic 8% ~8,450 16.9x
Aggressive 15% ~78,600 157x

Sit with that number for a second. At aggressive growth, a Pure AI product starting at $3,000/month in tokens is staring at roughly $471,000/month in tokens alone by month 36. The Hybrid equivalent, at a 0.40–0.44 token ratio, lands around $188,000–$207,000/month — and the infrastructure half of that gets the sublinear discount on top. This is the slide I show when a founder tells me "we'll just figure out the token bill later."

When I'd Recommend Each Architecture

I'd point you at SaaS when…

You're operating somewhere regulated, you genuinely don't need AI features (yes, that's still allowed), or you've watched a competitor get burned by a model provider's pricing change and you don't want that to be you. Highest build cost, most predictable running cost, and you sleep through the night.

I'd point you at Pure AI when…

You're racing for product-market fit, your user base is unlikely to hit five figures, or the model genuinely is the product (chatbots, writing tools, coding assistants). Just — please — model the token cost at your target scale before you commit. I've had this conversation enough times to know the spreadsheet is worth the afternoon.

I'd point you at Hybrid when…

You want intelligent features but you also want to be in business in three years. Hybrid lets you keep the sublinear infrastructure discount on roughly half your cost base while capping your token exposure to 40–44%. If you don't have a strong reason to be anywhere else, this is where I'd start you.

What I Want You to Take Away

  1. Build cost favours AI. Pure AI lands at roughly half the build cost of its SaaS equivalent. If you're cash-constrained or chasing speed, that's a real lever.
  2. Running cost favours SaaS and Hybrid. Sublinear infrastructure scaling is a genuine economy of scale. Tokens will never give you one.
  3. Growth rate is the variable I'd watch most carefully. The more confident you are about scaling, the more aggressively you should be cutting your token ratio.
  4. Hybrid is where I send most teams. It balances all three dimensions. Start here unless you have a specific reason not to.
  5. Don't decide on gut feel. I've watched smart people get this wrong because the spreadsheet felt like overkill. It isn't. Run the numbers for your complexity tier and your growth rate before you commit.

Run the Numbers on Your Own Scenario

I built the Cost Analyser to do exactly this calculation interactively — plug in your complexity, growth rate and starting users, and you'll see the 36-month story play out the way I'd show it on a whiteboard. Have a play before your next architecture conversation.

Open the Cost Analyser →
Suhith Illesinghe
I run Mustard Seeds Group, where I help founders and product teams pick the right architecture for what they're actually trying to build — and then help them ship it. I write about the trade-offs I run into in client work, with the numbers I actually use.
Get in touch →