Strategy10 min read

Measuring the ROI of an AI Project

How a small business can judge whether an AI build actually pays off, from baselining to a simple ROI calculation.

By Luka Filips

Key Takeaways

  • AI return on investment is the value a build creates minus its cost, divided by that cost, and the hard part is being honest about the inputs rather than doing the arithmetic.
  • You cannot prove an AI build improved anything without a baseline that records the time, cost and performance of the current process before you start.
  • Leading indicators such as adoption and error rate move within weeks and let you course-correct, while lagging indicators such as hours saved and revenue are the real returns but arrive on a delay.
  • Hard returns (hours saved, cost avoided, revenue influenced) should be counted in dollars, while soft returns (quality, consistency, capacity freed) are real but should be reported separately rather than dressed up as cash.
  • A simple AI ROI calculation, ROI equals annual value gained minus annual cost over annual cost, only holds up if you use a loaded hourly rate, include all running costs, and discount hours saved that are not redeployed.
  • Outcome metrics tied to money, time or customer-visible quality beat vanity metrics like logins and messages sent, which MIT Sloan notes give no direct reading on business value.
  • Returns are uncertain and often slow: the RBA reports mixed AI returns to date, and S&P Global found 42% of companies abandoned most AI initiatives in early 2025, so measurement is what separates builds worth scaling from ones worth stopping.

Most small businesses approve an AI project the way they approve a new hire: on a gut feeling that it will help. Then, six months in, no one can say whether it did. The invoice was real. The benefit is a shrug.

That gap is the whole problem with measuring the ROI of an AI project. AI return on investment, spelled out in full, is simply the value an AI build creates minus what it costs, divided by that cost. The arithmetic is easy. The hard part is being honest about the inputs, because AI vendors quote savings that never reach the bank account, and owners count activity instead of outcomes.

The stakes are rising. Australian businesses invested $668.3 million in AI research and development in 2023-24, up 142% in two years, making AI the fastest-growing area of business R&D, according to the Australian Bureau of Statistics. More spend means more pressure to prove it paid off. This article lays out how a small business should judge whether an AI build actually pays: baseline first, separate leading from lagging indicators, count hard and soft returns honestly, run a simple payback calculation, and refuse to be fooled by vanity metrics.

Be honest: AI returns are uncertain and often slow

Start from reality, not the sales deck. The Reserve Bank of Australia surveyed 105 medium-to-large firms in November 2025 and reported that returns on AI investment "have been mixed to date and they expect the returns will take time to be realised." The RBA also found adoption among smaller firms lags larger ones, and that uptake so far has been "relatively piecemeal," often employee-led rather than employer-led. Piecemeal adoption is hard to measure because no one owns the outcome.

The failure rate is not a footnote either. S&P Global Market Intelligence found the share of companies abandoning most of their AI initiatives jumped to 42% in early 2025, up from 17% a year earlier, as reported by CIO Dive, with the average organisation scrapping 46% of proof-of-concepts before production. We are not repeating those numbers to scare anyone off AI. We are making a narrower point: if nearly half of attempts stall, your money should follow evidence, not enthusiasm. Measurement is the discipline that tells the difference between the projects worth scaling and the ones worth killing.

You cannot measure improvement without a baseline

Here is the rule we apply before any build: if you have not measured the current state, you cannot prove the new state is better. A baseline is a snapshot of how a process performs today, in numbers, before AI touches it.

For a single workflow, capture three things. How long it takes (minutes per task, hours per week). What it costs (labour at a loaded hourly rate, plus any tools). And how well it performs (error rate, turnaround time, conversion, customer wait). Write those numbers down with a date. That record is the business case, and it is the thing you will compare against in ninety days.

Skipping the baseline is the most common and most expensive mistake we see. Without it, every claim afterwards is a guess dressed as a result. "It feels faster" is not a number. We treat baselining as the opening move of the discovery and audit phase, because the audit that maps where time and money actually go is the same audit that produces your before-figures. Measure twice, build once.

A baseline also protects you from over-claiming later. When the figure is on paper, no one can quietly inflate the win after the fact.

Leading indicators versus lagging indicators

Returns on an AI build arrive on a delay, so you need two kinds of signal.

Lagging indicators are the outcomes you ultimately care about: hours saved per month, cost avoided, revenue influenced, profit. They are the truth, but they show up late, sometimes a quarter or two after launch. If you wait only for these, you fly blind for months.

Leading indicators are the early signs that the lagging result is coming: adoption by staff, task completion rate, error rate trending down, time-per-task falling. They move within weeks and let you course-correct before you have spent the full budget.

A worked rhythm: in the first month, watch leading indicators to confirm the system is being used and is accurate. By month three, check whether the lagging numbers (hours, cost, revenue) have actually moved against the baseline. If the leading indicators are strong but the lagging ones never follow, that is your early warning that the build is producing activity, not value. That distinction is the heart of the next section.

Hard returns versus soft returns

Value from an AI build comes in two grades. Be rigorous about the first and honest about the second.

Hard returns convert to dollars without much argument:

  • Hours saved. Time-per-task before minus after, multiplied by volume and a loaded hourly rate.
  • Cost avoided. Headcount you did not need to add, software you retired, error correction you no longer pay for.
  • Revenue influenced. Faster lead response, more quotes out the door, fewer abandoned carts. Influenced, not "caused", which matters for attribution below.

Soft returns are real but resist a clean dollar figure:

  • Quality and consistency. The same correct answer every time, which an internal AI knowledge assistant delivers where a busy human varies.
  • Capacity freed. Owner attention redirected from admin to selling and strategy.
  • Resilience. Knowledge that no longer walks out the door when one person leaves.

Do not pretend soft returns are zero, and do not pretend they are bankable cash. We list them separately in any business case, quantified where we honestly can, described where we cannot. An owner who hides soft value undersells the project; an owner who dresses it up as hard cash misleads themselves.

A simple AI ROI calculation

You do not need a consultant's spreadsheet. You need one formula and disciplined inputs.

ROI (%) = (Annual value gained minus Annual cost) / Annual cost x 100

Payback period (months) = Total upfront cost / Monthly value gained

Work it with a small, deliberately modest example. Say a process consumes 40 hours a week of admin at a loaded rate of $45 an hour. That is $93,600 a year. An AI automation removes 60% of it: $56,160 of value a year. The build costs $30,000 upfront plus $400 a month in running costs, so $34,800 in year one.

  • Year-one ROI: ($56,160 - $34,800) / $34,800 = 61%.
  • Payback: $30,000 / ($4,680 monthly value - $400 running) = about 7 months.

Three rules keep the calculation honest. Use a loaded hourly rate (wage plus on-costs), not the bare wage. Include every running cost: model usage, maintenance, oversight. And discount the hours saved if the freed time is not actually redeployed into useful work; an hour "saved" that becomes an hour of scrolling is not a return. Run the same formula with a conservative case and an optimistic case, and make the decision on the conservative one.

Why outcome metrics beat vanity metrics

This is the line that separates measurement from theatre. MIT Sloan Management Review argues that technical model metrics such as precision, recall and lift give "no direct reading on the absolute business value" of an AI system, and that "the focus should be on business metrics, such as revenue, profit, savings, and number of customers acquired."

Translate that for a small business. Logins, messages sent, prompts run, documents processed and "engagement" are vanity metrics. They tell you the tool is busy. They do not tell you the business is better off. A chatbot can answer ten thousand questions and still deflect nothing if customers re-contact you anyway. Tie every metric back to a baseline outcome: tickets actually resolved without a human, hours genuinely returned, quotes that turned into revenue. If a number cannot be traced to money, time or a quality outcome a customer would notice, treat it as diagnostic, not as proof of return.

Process change is usually what converts a tool into a result. Buying AI and bolting it onto a broken workflow rarely moves the lagging numbers; redesigning the workflow around it does. That is why we plan the process and the build together rather than treating AI as a feature you switch on. Our note on planning AI builds walks through scoping the workflow first so the metrics have something real to measure.

A worked example: outcome-led measurement

Our Lead Management System build shows the discipline in practice. The baseline was painful and specific: leads handled manually across disconnected tools, with reporting cobbled together by hand. The system we built saved more than 1,500 hours per month and auto-generates the reports that used to eat days.

Notice how those figures behave. "1,500+ hours per month" is a hard return measured against a real before-state, the kind of number you can multiply by a loaded rate to get cost avoided. "Auto-generated reports" is partly hard (hours no longer spent compiling) and partly soft (managers see current numbers instead of stale ones, so decisions improve). We report both, kept separate. And we resisted the vanity version of the story, which would have boasted about messages processed or workflows triggered. Volume was never the point. Hours returned and reporting that runs itself were the point, because those are the outcomes the owner could feel and bank.

Attribution, over-claiming, and the limits of measurement

Two honest cautions, because credibility is the only currency that matters here.

First, attribution. If revenue rises after an AI build, the AI rarely deserves all the credit. A new salesperson, a seasonal lift or a price change may share it. This is why we use the word "influenced" for revenue rather than "caused", and why a baseline plus a sensible comparison window beats a triumphant before-and-after screenshot. Where you can, change one thing at a time.

Second, over-claiming. Some value is genuinely indirect and slow, exactly as the RBA found. Resist the urge to convert every soft benefit into a dollar figure to make a slide look better. A measured 61% return you can defend is worth more than a fictional 300% no one believes. The point of measurement is not to win an argument; it is to decide, with eyes open, whether to scale, adjust or stop.

The Enki approach: a disciplined Evaluate step

We treat measurement as a named stage, not an afterthought. The NIST AI Risk Management Framework defines a Measure function that "employs quantitative, qualitative, or mixed-method tools, techniques, and methodologies to analyze, assess, benchmark, and monitor AI risk and related impacts." We borrow that posture and apply it to value, not just risk: baseline before we build, instrument the system to capture leading and lagging indicators, then evaluate against the before-state at a set date.

This matters because the gap between adopting AI and capturing value is wide. Deloitte Access Economics found two-thirds of Australian SMBs use AI, but just 5% are fully enabled to realise its benefits, with $44 billion in potential GDP if more advanced even one rung. The businesses that move from the two-thirds to the 5% are not the ones with the cleverest models. They are the ones who measured, redesigned the process, and kept what worked.

We cannot promise a guaranteed return, and we would not trust anyone who does. What we offer is the discipline that makes returns visible and defensible: a baseline, a small honest ROI calculation, outcome metrics over vanity ones, and a clear-eyed evaluate step that is willing to say a build is not paying off. Done that way, an AI project stops being a leap of faith and becomes a decision you can stand behind.

Frequently Asked Questions

Start by baselining the current process: record how long it takes, what it costs at a loaded hourly rate, and how well it performs, all with a date. After launch, track leading indicators (adoption, error rate) within weeks and lagging indicators (hours saved, cost avoided, revenue influenced) by around month three. Then apply a simple formula: ROI equals annual value gained minus annual cost, divided by annual cost. The discipline is in honest inputs, not the maths.
Sometimes, and the only way to know is to measure rather than assume. Returns are genuinely uncertain: the Reserve Bank of Australia reports AI returns have been mixed and take time to appear, and S&P Global found 42% of companies abandoned most AI initiatives in early 2025. A build is worth it when a conservative ROI calculation, run against a real baseline, shows a payback period you are comfortable with and the freed time is actually redeployed into useful work.
There is no universal number, but for a small-business automation we look for the upfront cost to be recovered within roughly six to twelve months on conservative assumptions. Calculate it as total upfront cost divided by monthly value gained, after subtracting running costs like model usage, maintenance and oversight. Run an optimistic and a conservative case, then decide on the conservative one. A defensible payback you believe beats an aggressive one you do not.
Hard returns convert to dollars with little argument: hours saved, cost avoided, and revenue influenced. Soft returns are real but resist a clean figure: better quality and consistency, capacity freed for the owner, and knowledge that no longer leaves when a staff member does. Report both, but keep them separate. Hiding soft value undersells the project, while dressing it up as cash misleads you.
Vanity metrics such as logins, prompts run and messages sent tell you a tool is busy, not that the business is better off. MIT Sloan Management Review argues that technical and activity metrics give no direct reading on business value, and that the focus should be on revenue, profit, savings and customers acquired. Always trace a metric back to money, time, or a quality outcome a customer would notice; if it cannot be traced, treat it as diagnostic, not proof of return.
No, and we would be cautious of anyone who does. Some AI value is indirect and slow, attribution is hard because other factors also move revenue, and a large share of projects stall before production. What a disciplined partner can offer is a baseline, an honest ROI calculation, outcome-led metrics, and an evaluate step willing to recommend stopping a build that is not paying off. That makes the return visible and defensible, which is more valuable than a promise.

Ready to implement AI in your business?