Your AI bill just surprised you. You're not alone.
A founder emails: "we were at $400/month in January. April bill just came in at $18,000. What happened?" Happened to five founders I know this year.
LLM spend is not a flat subscription. It's usage-based. It scales with your agents, your users, your prompts, your context windows, your retry logic. Most companies never set up cost control until the surprise bill arrives.
This module is 90 minutes of building the cost discipline that prevents the surprise. By the end:
- A mental model of where costs actually hide.
- A per-agent cost dashboard.
- Cost caps with alerting, before you need them.
Thinker.
Four cost categories. Each needs a different control.
- Per-call cost. Input tokens + output tokens × per-token price. The fundamental unit.
- Context bloat. Every turn in a conversation adds tokens. Long conversations cost exponentially more than short ones.
- Retry cost. When an agent fails, it often retries silently. 3 retries = 3× cost on that call.
- Background cost. Scheduled agents, batch jobs, eval suites. Easy to forget, add up fast.
The 80/20
80% of AI cost surprises come from two things: a bug in retry logic, or context bloat from long conversation threads. Fix those and your bill is predictable.
Cost is not the same as value
High cost is not automatically bad. A $50 agent call that qualifies a $50k deal is fine. A $0.05 call that produces nothing is wasteful. Always pair cost with outcome (Module 023).
Talker.
The cost review prompt
Run monthly, per agent.
You are a cost auditor. Here is 30 days of LLM usage data
for one agent:
- # calls: N
- Avg input tokens per call: X
- Avg output tokens per call: Y
- Retry rate: Z%
- Top 5 most expensive traces: [paste]
- Outcome metric for this period: [e.g., tickets resolved]
Produce a review with:
1. Total spend.
2. Cost per outcome (spend / N_outcomes).
3. Flag: is input token usage trending up? (context bloat)
4. Flag: is retry rate above 10%? (retry cost)
5. Top 3 specific opportunities to cut cost without losing
outcome.
Be specific. "Shorten the system prompt by 30%" is specific.
"Optimize prompts" is not.
Use the output to open engineering tickets. Include the data. Don't ask engineering to guess.
Rememberer.
Cost data has a home.
[company-repo]/costs/
dashboards/
by-agent.csv (updated nightly)
by-day.csv
reviews/
[agent]-[month].md (monthly per-agent reviews)
caps.md (per-agent budget caps)
Caps are code
Every agent has a daily dollar cap. When the cap is hit, the agent refuses further calls and pages the owner. Better to degrade gracefully than to wake up to a $10,000 bill.
The monthly review
First Monday of each month, 30 minutes. Every agent owner runs the cost review prompt on their agent. Files the output.
Top 3 cost-cutting opportunities go into the engineering backlog. Not as wishes, as tickets.
Doer.
Twelve minutes. Audit one agent's cost, today.
Step 1. Pick the agent (1 min)
The one you're least sure is cost-effective, or the one that's growing fastest.
Step 2. Pull 30 days of data (4 min)
From your provider dashboard or your own logs. Five numbers: total calls, avg input, avg output, total cost, outcome count.
Step 3. Run the cost review prompt (2 min)
Paste from Talker with your numbers. Get a review.
Step 4. Pick one action (3 min)
Read the top 3 opportunities. Pick one. Write down what you'll do this week to implement it.
Step 5. Set a cap (2 min)
Pick a daily dollar cap that's 1.5x your current daily spend. Implement it in code. If the agent doesn't currently enforce caps, add that as a second action.
One reviewed agent. One action to cut cost. One cap preventing surprise. Next month's surprise averted.
- You have no outcome data: you can't do proper cost-per-outcome. Instrument first (Module 023), then come back.
- Cost seems low but bills are high: look for background jobs, eval suites, or cron'd agents. They're easy to forget.
- The agent retries a lot: that's an engineering bug, not a cost issue. Fix the retry logic first.
Rookie.
Failure 1. No caps
Your agent has no ceiling. A bug, a bad actor, or a runaway conversation burns $10,000 in a weekend.
Fix: hard dollar caps per agent, per day. Enforce in code. Fail closed.
Failure 2. Context bloat in long conversations
A user conversation goes 30 turns. Each turn carries the full history. By turn 30, each call is 15x the token cost of turn 1.
Fix: summarize history every N turns. Drop stale tool results. Conversations should have a context budget.
Failure 3. Silent retries
Your framework retries failed calls automatically. You never see the retries. You just see a bigger bill.
Fix: log every retry. Dashboard the retry rate. If retry rate is > 10%, you have a bug, not a cost problem.
Manager.
Budget per agent, not just per company
Company-level LLM budget is too coarse. One agent going off the rails drains everything. Per-agent caps catch problems before they compound.
Monthly cost review meeting
30 minutes, first Monday. Each agent owner presents: spend, outcome, cost per outcome, one action for the month. Team lead tracks trends.
Cost as owner responsibility
The agent owner owns the cost. Not finance. Not engineering. The owner. They get the dashboard, they get the cap alerts, they get the monthly review. Alignment is automatic when the accountability is clear.
Chief.
Risk 1. Budget surprise
A 40x cost overrun in a quarter hits the P&L at exactly the moment leadership is justifying the AI budget to the board. Worst possible timing.
Governance: caps + alerts + monthly reviews. No agent ships without cost instrumentation.
Risk 2. Vendor lock
Every prompt is written for one model family. Moving to a cheaper alternative is possible but costly. Don't assume today's pricing will hold.
Governance: annual "model migration dry run." Take your top 3 agents. Run them on an alternative provider. Know your options.
Risk 3. Cost-cutting that breaks outcomes
Cheaper model ≠ same outcome. Cost cuts that regress resolution rate are bad trades.
Governance: every cost optimization goes through the eval suite (Module 006) before shipping. Cost wins only count if outcomes hold.
Founder.
Solo founder: the monthly LLM bill is something you glance at on your card statement, until it isn't.
The solo cost stack
- Hard cap per agent. Daily dollar limit in code.
- One dashboard, one page, showing spend per agent this month.
- Weekly 5-minute check-in with that dashboard. Anomalies get handled immediately.
- Monthly cost review prompt, per agent.
The cheap-first instinct
Start every new agent on the cheapest competent model. Upgrade only if the outcome requires it. Most agents don't. The instinct to reach for the flagship model is expensive and usually wrong.
Cost is a dial, not a surprise.
If your bill surprises you, you didn't have cost control, you had hope. Every agent gets a cap. Every month gets a review. The bill becomes boring, which is the goal.