All posts
Engineering 6 min read

Why we chose token-based pricing over per-call pricing

API pricing is usually simple. One call equals one credit. We tried that for about a week before it fell apart.

Microwave has over 90 endpoints across wildly different categories. A slug generation call completes in under 2ms — it’s pure string manipulation, no external dependencies. A currency conversion call hits a live data feed, normalizes rates, and applies precision rounding — maybe 20ms. A banking BIN lookup queries a 400,000-entry dataset, cross-references against an issuer network, and returns enriched card metadata — closer to 40ms and significantly more infrastructure per call.

Under flat per-call pricing, you have two bad options: price for the expensive calls (and gouge users of simple utilities), or price for the cheap calls (and lose money on every FX conversion). Neither is right. Both are dishonest.

The token model

We assign each endpoint a token cost proportional to the actual work it performs. The cost factors in:

  • Computation time — pure CPU/memory operations vs. I/O-bound work
  • External data dependencies — whether the call requires a live upstream fetch
  • Dataset size — how much data we maintain and query to answer the request
  • Result complexity — simple scalar vs. structured enriched object

A slug generation call costs 1 token. Currency conversion costs 3. Address parsing — which validates against postal databases, normalizes street abbreviations, and infers missing components — costs 3. Banking BIN lookup costs 5.

These aren’t arbitrary. We track actual infrastructure cost per endpoint category and calibrate token weights quarterly. If the FX data feed costs three times as much to maintain as the timezone database, that ratio should be visible in the pricing.

What you always know before you call

The token cost for every endpoint is documented. It doesn’t change at runtime. There are no surprise multipliers, no “this request was more complex so we charged more.” You know what a call costs before you make it, and you can calculate your monthly bill from first principles.

This matters more than it sounds. The APIs we’re replacing often have opaque pricing: you get billed for “compute units” or “request credits” that don’t obviously map to actual requests. Our model is explicit: endpoint X costs N tokens, you have Y tokens in your plan, math works out.

Batch requests

When you batch multiple items in a single call — for example, parsing 50 addresses in one request — the token cost scales linearly with the item count. A 50-item address parse batch costs 150 tokens (50 × 3). This is intentional: batching is a latency and connection optimization, not a pricing discount. The work done is proportional to the items processed.

This also means batch pricing is predictable. You can estimate a batch job’s cost before sending it.

The unexpected benefit

Designing token costs forced us to think rigorously about what each endpoint actually does — and in several cases, led us to simplify implementations that were doing unnecessary work. When you have to justify a token weight, you interrogate the call path.

We found two endpoints early on that we’d overcomplicated: they were making redundant upstream checks that added latency without adding value. Removing those brought the token cost down and made the endpoints faster. The pricing model made us better engineers.

The edge case: ML/embeddings

Text classification and embedding generation are currently priced separately from the standard token model, because the cost structure is meaningfully different — GPU time vs. CPU time, model size, batch size. We document this clearly and are working toward a unified model. For now, if you’re using those endpoints, the pricing page has a dedicated table.

Why not just make everything 1 token?

We considered it. The appeal is simplicity. But it would mean we’re subsidizing expensive calls with revenue from cheap ones — which is fine until your API mix shifts. It also misaligns incentives: if FX conversion costs the same as slug generation, there’s no reason to choose the cheaper option, and no economic signal to us that certain categories are load-bearing.

The token model lets us price fairly, stay sustainable, and be honest about what things actually cost. That felt more important than the simplicity of “everything is 1.”