Thesis

The model is the commodity. Your bar is the moat.

Frontier models are converging and getting cheaper by the month. The durable advantage is the private standard you measure them against.

All postsJune 20264 min read

For two years the strategy was simple: get access to the best model and win. That window is closing. Stanford’s AI Index reports the gap between open and closed models narrowed from 8% to 1.7% in a single year, the cost of GPT-3.5-level quality fell more than 280× in under two, and the top of the leaderboard is now separated by rounding error. Renting intelligence is becoming a procurement decision, not a strategy.

So where does durable advantage live? Increasingly, everyone who has thought about it points to the same place - not the model, but the loop you build on top of it.

The value moved up the stack

Satya Nadella calls it ‘token capital’: the AI capability a firm builds and owns, compounding through a private learning loop that “cannot be replicated by simply licensing the same foundation model.” Bain says proprietary intelligence - unique data, encoded workflows, a real evaluations function - is what compounds “from day one.” Sequoia puts it most cleanly: “today’s judgement will become tomorrow’s intelligence.”

They’re all describing the same asset from different angles: a record of what ‘good’ looks like for your work, that gets sharper every time your AI runs.

But “evals” alone is not the moat

Here’s the trap. The moment evaluation becomes important, the labs ship it - OpenAI has eval tooling, Anthropic open-sourced an eval framework. Generic eval mechanicsare commoditizing as fast as the models. If your advantage is “we have a dashboard for evals,” it gets absorbed.

The moat is narrower and harder to copy: the private standard built from your own production traffic, plus the routing and regression loop that runs on top of it. Nobody can buy the data your operations generate. Nobody can rebuild the edge cases your users hit. A competitor can license the same model tomorrow; they cannot license your bar.

What this looks like in practice

AgentModus learns what ‘good’ means for each of your tasks from your own traffic, turns it into a benchmark you own, and uses that bar to route every task to the cheapest model that still clears it - flagging regressions before they reach a user. The bar is the durable part. The models plug into it and get swapped out as better or cheaper ones ship.

That’s the test of control in this era, in Nadella’s words: you should be able to swap a generalist model without losing the company-veteran judgment baked into your system. Swap the model; keep the standard.

The models will keep changing. The bar you hold them to is the thing that’s actually yours - and unlike the model, it compounds.

See it on your own traffic.

We’ll learn the bar for your tasks and show you the cheapest model that still clears it.

Book a call