Cost

Coinbase cut its AI spend ~50%. Here's the piece most teams are missing.

Routing to cheaper models is now a CEO priority. But routing safely needs one thing almost nobody has.

All postsJune 20264 min read

Brian Armstrong, Coinbase’s CEO, recently laid out how they keep AI spend flat while token usage grows exponentially - and the result: they’ve cut AI spend nearly in half while usage keeps climbing. The playbook is better defaults (cheaper, often open-weight models), better routing, aggressive caching, lean context, and visibility instead of usage caps. The line that matters most: “Ultimately, humans shouldn’t be choosing models - AI can automate this task.”

He’s right, and the prize is enormous. Research routers like RouteLLM show you can hold ~95% of frontier quality while cutting cost more than 85% by sending each request to the model that’s good enough for it. When the CEO of a company like Coinbase is personally publishing the routing playbook, this has stopped being an engineering nice-to-have. It’s a board-level cost line.

The piece the playbook skips

Read it again and notice what’s missing: how do you know the cheaper model is still good enough?Coinbase routes on heuristics - a frontier model for planning, a cheaper one for execution. That’s a smart guess about task type, not a measurement of quality. And it took elite engineers building custom harnesses to get even that far. Most teams can’t.

This is exactly where cost programs stall. You can see the 85% sitting there, but routing down feels like a gamble - so you keep paying frontier prices “to be safe.” The blocker isn’t the router. The blocker is trust.

Routing safely needs a bar you trust

The thing that turns routing from a gamble into a decision is a quality bar grounded in your own work: a per-task standard, learned from your production traffic, that says whether a given model actually clears it. With that in place, you don’t guess by task type - you route down to the cheapest model that’s measured to pass, and you get an alert the moment quality slips.

That’s what AgentModus builds. The savings Armstrong is talking about are real and available to everyone. The catch is proving the cheaper model still clears your bar - so you can take the savings without flying blind.

See it on your own traffic.

We’ll learn the bar for your tasks and show you the cheapest model that still clears it.

Book a call