Most AI products that run agents cannot tell you what a job will cost before they run it. That is not a billing bug — it is the nature of the thing. An agentic run is a loop: the model reads a file, calls a tool, reads another file, proposes a fix, checks its own work, and decides whether to keep going. The number of turns is not knowable upfront, so the token spend is not knowable upfront, so the cost is not knowable upfront.
The industry's usual answers all shift that uncertainty onto someone:
- Subscriptions with soft limits — the vendor eats variance on heavy users and quietly throttles them.
- Post-hoc per-token billing — the customer eats variance and finds out at the end of the month.
- Credits — variance gets laundered through an abstraction nobody can reason about.
We did not want any of those. A VibeGuard audit is a one-off purchase: you see a price after your free scan, you pay it with PayPal, and that price is final. No surprise extras. At the same time, we run a real business with a hard rule that every job must clear a minimum gross margin. A variable-cost agent plus a fixed price means someone eats the variance — unless you remove the variance.
Price the ceiling, not the average
The core move is simple to state: we do not price the expected cost of the run. We price the worst case, and then we make the worst case enforceable.
At quote time, the pricing engine estimates a single total token ceiling for the agentic run, derived from your tier, file count, findings, and complexity. That ceiling — not an average, not a guess about typical behavior — is what goes through the pricer and becomes your quote.
Then the part that makes it deterministic: when the paid audit actually executes, the runtime budget enforcer is constructed from the same priced number. We call it the TokenLedger. Every model turn records its input and output tokens against the ledger. A working turn is not even started unless the remaining budget covers a maximum-size turn. If spend somehow exceeded the ceiling anyway, the ledger raises and the job stops — belt and suspenders on top of the admission check.
The result is that executed total ≤ priced total is structural, not statistical. We are not hoping the agent stays under budget on average. The code that spends tokens literally cannot spend more than the code that priced them said it could.
The finalization reserve
A hard ceiling creates a new failure mode: an agent that hits the wall mid-run and delivers nothing. That is worse than useless — you paid for a report.
So the ledger withholds a finalization reserve, sized from the priced budgets for the PR description and the report. Working phases — the reviewers, the fix agent — can only spend what is left after the reserve. However aggressively the working phases explore, the engine always has enough tokens left to write your deliverables. And if the model persistently returns empty output during finalization, the job fails loudly and visibly rather than shipping a blank report. A silent empty delivery is treated as a bug, not an outcome.
Counting tokens like money
Deterministic pricing only works if the token counts going into the quote are real. Local tokenizer estimates routinely under-count the model's actual tokenization by 5–15%. Under-counting at quote time means under-charging — losing money silently on every job.
So we refuse to estimate. Quote-time input counts come from the model provider's official token-counting API. If that API is unavailable, we do not fall back to a local estimator — we return a "pricing temporarily unavailable" error and page ourselves. We never quote a number we cannot stand behind.
The same discipline applies to prompt caching. Caching can cut input cost by 90% on cached prefixes, and the temptation is to assume the cache hits and quote cheaper. We assume zero cache hits. Caches miss for reasons you do not control — new prefix, eviction, model version change. If a cache does hit, our actual margin widens; the customer never pays for our optimism.
The profitability gate
Every quote, every checkout, and every execution re-verifies that the gross margin clears a configured minimum. If a preliminary price does not clear the gate, the pricer iteratively raises it until it does — and if the price would have to exceed a sanity ceiling to be profitable, we refuse to quote at all rather than sell at a loss or at an absurd price.
No unpriced retries
One last leak to plug: what if your code changes between the quote and the run? The scope of what the audit will see is captured at quote time as a SHA-256 hash over a canonical description of the tier, model, prompt version, and every included file. That hash is recomputed before any paid model call. If it does not match — a new push, a re-upload — the audit aborts before spending anything, and our team reviews your payment for credit or refund. We never silently re-run a priced job against different content, because that run would be unpriced.
What this buys you
The practical consequence is a sentence we can put in writing: the quote is a guaranteed cost ceiling. Not "typically around," not "estimated." The price you accept is the price you pay, and the machine that does the work is physically incapable of spending past what you paid for.
It also keeps us honest about depth. Want a deeper audit? That is a bigger ceiling, which is a higher price, visible upfront — not a quiet quality cut on a fixed-price job that ran long.
Variable-cost agents are usually priced with hope. We priced ours with a ledger.