The fourth layer: why coding agents are about to become your biggest AI bill

On June 1, GitHub Copilot moved every plan over to usage-based billing. You pay by the token now, at each model's API rate, drawn from a monthly credit pool. A lot of the coverage treated it like a shock. I didn't, because none of this is new.

The cloud has billed this way for the better part of two decades. You pay for what you use, metered by the hour, the request, the gigabyte. SaaS has been drifting the same way for years, off flat per-seat licenses and onto usage. So Copilot moving to tokens wasn't a surprise to me. It was just the last big coding tool to do the obvious thing. Cursor got there in June 2025, and Claude Code billed on tokens from day one.

So the part I actually care about isn't that the pricing changed. It's where the meter finally landed: on the coding agent your engineers run all day. That's the fourth layer of AI spend. And unlike the three layers under it, almost nobody is set up to watch it yet.

That's the shift worth your attention. A seat was a fixed number you could plan around. Usage isn't, and the bill now tracks how hard your team works. The cloud, the data platforms, and production AI all got instrumented years ago. This layer hasn't.

What do I mean by the fourth layer?

AI spend showed up in waves. Cloud infrastructure first. Then the data platforms sitting on top of it. Then production AI, the models running inside your product, billed by the token. Every one of those caught finance and engineering off guard before anyone had the tools to see it.

The fourth wave is the developer endpoint: the coding agents your engineers open every day. Claude Code, Cursor, Copilot, Codex, Windsurf. It's the newest of the four, it's growing the fastest, and it's the least watched. That pricing change just turned it into a metered compute budget that, in most orgs, nobody is actually metering.

And it's not just engineering anymore

We keep talking about this as a developer problem, and most of the volume still is. But that framing is already out of date. I'm a marketer, and I'm burning through tokens too: building websites and landing pages, drafting a blog series like this one, working up strategy and competitive positioning, writing scripts, producing video. Product managers are prototyping with the same tools. So are designers, analysts, ops. The moment a tool meters tokens, anyone who points it at real work is running up spend.

So the assumption sitting under most budgets, that this is the CTO's line item, is already wrong. It's everyone's. And that makes it harder to see, not easier, because the spend is scattered across functions nobody thought to instrument.

Nobody actually knows what a prompt costs

Here's a question almost no engineer can answer: what did that last prompt cost? Not roughly, actually.

A single prompt is usually cheap, a few cents. But that's not how anyone works anymore. You hand the agent a task and it goes off and plans, reads half the repo, runs some tools, writes a diff, second-guesses itself, and tries again. That's an agentic session, and it can cost many times what a single prompt does. The same developer, in the same afternoon, can run a two-cent completion and a thirty-dollar agent session back to back and feel no difference while they're doing it. The cost lands later, on a bill, cut off from the moment that caused it.

So no single developer using these tools has predictable costs. Not because they're careless, but because nothing in the experience tells them what anything costs while they're in it.

The work isn't static, so the spend isn't either

This is the part I find genuinely interesting, and it's why I don't buy "just forecast it" as an answer.

The way we use these tools doesn't sit still. We generate ideas, we riff on a concept, we get inspired, and that motivation turns straight into tokens. One good session pulls you into three more. Someone ships a new model on Tuesday and by Thursday there's a use case for it that didn't exist on Monday. You hit a wall on quality and reach for a bigger, pricier model to get unstuck. None of that is waste. It's the work. But every bit of it moves your spend.

So the things that decide your cost are all moving at once. How we use the tools changes. The models we reach for change. Even the output quality changes, which changes how hard we push. Predictability assumes the thing you're measuring holds still. This one doesn't.

So predictability was never really the goal

If the work is creative and the landscape keeps shifting, a fixed monthly number per developer was always going to be a fiction. And chasing it is the wrong instinct, because the only way to force predictability is to clamp down, and clamping down on the highest-return tool your engineers have is a bad trade.

The goal was never to make the spend predictable. It can't be. What you can do is make it visible: see what actually happened, attribute it, and price it after the fact, so the surges and the model switches become things you understand instead of things you find on an invoice.

The biggest variable is still the model

Of all those moving parts, the model you run moves the bill more than anything else, and it's the one almost nobody is watching. Cursor put real numbers on this when it changed pricing in June 2025. The same $20 of included credit went two to three times further depending only on which model you picked:

Today's per-token rates say the same thing. A frontier model like Claude Opus runs about five times the cost of a small, fast one like Claude Haiku. (Copilot's old request-based scheme made the gap look even wider, with multipliers up to 27x for a frontier Claude model and 57x for a top OpenAI model. Those were legacy annual plans, retired with token billing on June 1, but they're a reminder of how differently these models have always been priced.)

So what does the popular model actually cost?

The three big tools now bill on the same basis, the model's published API rate, so a given model costs about the same wherever you run it. What moves your bill is the per-token rate of the model itself, by the million tokens in and out:

Model	Input	Output
Claude Haiku 4.5	$1	$5
GPT-5.3-Codex	$1.75	$14
Gemini 3.1 Pro	$2	$12
Claude Sonnet 4.6	$3	$15
Claude Opus 4.8	$5	$25

Claude Sonnet costs about three dollars per million input tokens whether you run it in Copilot, Cursor, or Claude Code, because they're all billing at the model's API rate. The difference between the tools is mostly packaging. The difference between models inside any one tool is five times or more. In practice a typical agentic task runs a little over a dollar on Sonnet and closer to two on Opus, and teams run thousands of those a month. So the thing worth paying attention to isn't the logo on the editor. It's which model is doing the work, and hardly anyone is looking.

Later in this series I break this down tool by tool, and I put real numbers on it with a model-by-model index of what one fixed task costs on each.

Why your FinOps tools miss all of this

The cloud cost platforms you're probably already running were built for a different problem. They read provisioned resources, they lean on tags, and they assume the thing spending money is infrastructure you stood up. A coding agent is none of that. It's usage at a developer's keyboard, agentless, spread across vendors that each report spend in their own units, when they report it usefully at all. The incumbents aren't wrong. They're just pointed at the first three layers.

What covering the fourth layer looks like

At PointFive we map AI spend across all four layers: production AI, data platforms, cloud infrastructure, and the developer endpoint. The fourth is where TokenShift lives. It pulls coding agent spend across Claude Code, Codex, Cursor, Copilot, and Windsurf into one view, attributed to teams, broken out by model, and tied back to the work it produced. The model breakdown is the whole point. The moment you can see one team running a frontier model for work a lighter one would handle fine, the saving stops being a mystery and becomes a decision.

I'm not trying to talk anyone out of coding agents. They're some of the highest-return tools an engineering org has ever picked up, and the answer to usage-based pricing is visibility, not a clampdown. Once you can see it, the math gets better on its own. Our customers at Nubank saw ROI in about ten days, and across the platform the average sits above 1,200 percent.

If there's one thing I'd leave you with, it's that the cheapest win on the table right now is simply seeing and pricing the layer nobody else is looking at.

FAQ

What changed in coding agent pricing in 2026? The major tools moved from seat-based subscriptions to usage-based billing tied to token consumption. GitHub Copilot switched all plans on June 1, 2026, and Cursor made a similar move in June 2025. What you pay now depends on how much you use and which model you use, not just how many seats you bought.

What does a single prompt actually cost? A single prompt is usually a few cents. An agentic session, where the tool plans, reads, runs tools, and revises on its own, can cost many times more, sometimes thirty or forty dollars in one run. Most of the time you can't tell which one you just triggered.

Why can't I predict my coding agent costs? Because there isn't a stable per-developer number to predict in the first place. The way people use these tools shifts week to week, the models they reach for change, a new model can spawn a whole new use case, and one agentic session can cost many times a single prompt. It's less a control problem than a visibility one: you see what it cost after the fact, attribute it, and learn from it.

Is coding agent spend only an engineering cost? No. Engineering is still the biggest consumer, but marketing, product, design, and others are using the same agents to build sites, content, prototypes, and more. The spend is spreading across functions, which is part of why it's so hard to see.

Why does the choice of model affect cost so much? Each model is billed at its own per-token rate, and those rates vary a lot. The same monthly allowance stretches to roughly two to three times as many requests on a lighter model as on a frontier one, and frontier models can cost five times or more per token than small ones.

Can my existing cloud cost tool see coding agent spend? Usually not. Cloud cost platforms are built around provisioned resources and tags. Coding agent usage is agentless and spread across multiple vendors, which is why it needs its own coverage.