Most agent ROI numbers are fiction. Here's how to produce the real ones.
Your CFO asked: "what's the ROI on the AI agents we're paying for?" You don't have a clean answer. Neither does anyone. "Time saved" estimates are soft. "Tickets deflected" is gameable. "Productivity lift" is a vibe.
The problem isn't that agents don't produce value. It's that most companies never instrumented for measuring value. By the time the CFO asks, there's no way to answer honestly.
This module is 90 minutes of building the measurement you needed 6 months ago. By the end:
- Four categories of agent value, each measurable.
- An ROI dashboard for one specific agent in your stack.
- A quarterly review ritual so the CFO's next ask has an answer.
Thinker.
Agents produce value in four categories. Measuring them is different work per category.
- Cost reduction. Hours not spent, headcount avoided, vendor fees not paid. Directly measurable in dollars.
- Revenue produced. Qualified leads created, deals influenced, churn prevented. Measurable with attribution work.
- Quality lift. Lower defect rates, faster cycle times, higher customer satisfaction. Measurable with before/after.
- Capability unlock. Things the team can now do that they couldn't before. Hardest to measure but often the biggest real value.
The ROI trap
Reporting only category 1 (cost reduction) makes agents look boring. It undervalues capability unlock. A good ROI report weights all four.
The counterfactual problem
ROI is always "what happened vs. what would have happened without the agent." You never see the counterfactual directly. You estimate it. Be honest about the estimate's noise.
Talker.
The measurement prompt
Run this once per agent, quarterly, to produce a real ROI brief.
You are a finance analyst. I'm producing a quarterly ROI
brief for an AI agent. I'll give you:
- The agent's name and what it does.
- Volume data: # calls, # actions, over the last 90 days.
- Cost data: LLM spend, engineer hours maintaining.
- Outcome data: [relevant metric: tickets resolved, leads
qualified, drafts produced, etc.].
Produce a brief with:
1. Cost incurred (hard dollars).
2. Estimated cost avoided (hours saved × loaded rate).
3. Estimated revenue influence (if applicable, with
assumption stated).
4. Quality signal (before/after, if available).
5. Capability unlock (what can we now do? 1 sentence).
6. Net assessment: positive, break-even, negative. One
sentence of reasoning.
Be conservative on estimates. Flag assumptions. Do not
manufacture precision.
The "do not manufacture precision" line matters. CFOs smell false precision. Ranges beat fake decimals.
Rememberer.
ROI data has a home. Without one, every quarter you rebuild the dataset.
[company-repo]/roi/
agents/
support-agent/
usage-Q1-2026.csv
outcomes-Q1-2026.csv
brief-Q1-2026.md
sales-agent/
...
loaded-rates.md (fully-loaded cost per role)
attribution-rules.md (how to count influenced revenue)
The loaded-rates file
Finance gives you fully-loaded hourly cost per role. SDR: $80/hr. Support rep: $50/hr. Engineer: $150/hr. Write it down once. Everyone references the same number.
The attribution rules
Revenue influence is a policy question. Does a lead that the agent qualified and a human closed count 100% for the agent? 50%? 0%? Write the rule. Apply consistently.
Doer.
Twelve minutes. Produce a real ROI brief for one agent in your stack.
Step 1. Pick the agent (1 min)
Whichever you're least sure is paying off.
Step 2. Pull the data (4 min)
Three numbers, minimum:
- LLM spend, last 90 days.
- Engineer hours maintaining, last 90 days.
- One outcome metric (tickets resolved, leads created, drafts shipped).
Step 3. Run the measurement prompt (2 min)
Paste the prompt from Talker with your data. Save the output as brief-Q[N].md.
Step 4. Stress-test the assumptions (3 min)
Read every assumption the brief makes. Ask: is this number a vibe or a fact? Adjust. Conservative beats overstated.
Step 5. Share (2 min)
Send to your CFO or whoever asked. Make the assumptions visible. Offer to redo with better data next quarter.
One honest ROI brief. Not a defense, not a pitch. An analysis. That's the baseline for the rest of the year.
- You have no outcome data: instrument now. Every agent needs at least one outcome metric. If it doesn't have one, you can't measure it.
- The brief looks too good: you're being generous. Cut the estimate by 30% and see if it still holds.
- The brief looks bad: that's useful information. Maybe this agent should be retired, not reported.
Rookie.
Failure 1. Reporting activity as outcome
"Our agent handled 5,000 tickets this quarter." That's activity. The outcome question is: were those tickets resolved? At what customer satisfaction? The activity metric hides the outcome.
Fix: every agent dashboard shows activity AND outcome. If you can't measure outcome, start there before claiming value.
Failure 2. Fake precision
"This agent saved $127,453.22 this quarter." No it didn't. It saved somewhere between $40k and $200k, with a lot of assumptions. Writing "$127k" hides the range.
Fix: report ranges. "Cost savings: $50k-$150k, assuming X." Executives respect honesty about uncertainty more than false precision.
Failure 3. Ignoring capability unlock
The agent let you respond to customers 24/7 in 10 languages. That's not in the hours-saved column. It's a new capability.
Fix: report capability unlocks as qualitative wins with quantifiable downstream effects (e.g., "launched in 3 new markets").
Manager.
One owner per agent's ROI
The person who owns the agent also owns its ROI brief. Not finance. Not the data team. The owner. Because the owner knows the assumptions.
The quarterly ritual
Every quarter, every agent owner produces a brief using the prompt from Talker. 30 minutes per agent. Files them in roi/agents/[agent]/brief-Q[N].md.
The team lead compiles them into a one-page summary for leadership. Nothing more ornate needed.
The retirement conversation
If an agent's brief is negative two quarters in a row, schedule a retirement conversation. Maybe the agent should be tuned. Maybe it should be killed. Don't let dead agents linger.
Chief.
Risk 1. ROI as optics
Boards love AI ROI numbers. That creates pressure to produce them, even when the data is thin. Resist.
Governance: if an agent doesn't have clean outcome instrumentation, report "instrumentation in progress, preliminary estimates only." Don't let optics drive fiction.
Risk 2. Underinvestment in measurement
Measurement is boring. Building new agents is exciting. Teams keep shipping new agents without measuring the old ones. Two years later, you have 15 agents and 2 real ROI numbers.
Governance: budget measurement as a percentage of build. 10-20% of agent engineering time goes to measurement infrastructure. Not optional.
Risk 3. The capability-unlock blind spot
The biggest value agents create (doing things you literally couldn't do before) is the hardest to measure. If you only report cost savings, you systematically undervalue the most strategic agents.
Governance: every ROI brief has a "capability unlock" section, even if qualitative. Train leadership to read that section as equally important.
Founder.
Solo founder: your ROI conversations are with yourself and your investors.
The one-page ROI doc
Once a quarter, 30 minutes, update one markdown file:
# Agent ROI, Q[N]
- [agent 1]: what it costs, what it produced, what it unlocked
- [agent 2]: ...
Honest assessment: which I'd keep, which I'd kill, which
I'd double down on.
The gut check
For each agent: if I had to pay its cost out of my own pocket this month, would I? If yes, it's earning. If no, it isn't. The gut answer is usually right.
Measure outcomes, not activity.
Agents are easy to operate and hard to evaluate. The discipline that separates useful AI programs from expensive ones is: ship an outcome metric with every agent. Review it quarterly. Kill the losers. Double down on the winners. No shame in either move.