Most small businesses approve an AI project the way they approve a new hire: on a gut feeling that it will help. Then, six months in, no one can say whether it did. The invoice was real. The benefit is a shrug.
That gap is the whole problem with measuring the ROI of an AI project. AI return on investment, spelled out in full, is simply the value an AI build creates minus what it costs, divided by that cost. The arithmetic is easy. The hard part is being honest about the inputs, because AI vendors quote savings that never reach the bank account, and owners count activity instead of outcomes.
The stakes are rising. Australian businesses invested $668.3 million in AI research and development in 2023-24, up 142% in two years, making AI the fastest-growing area of business R&D, according to the Australian Bureau of Statistics. More spend means more pressure to prove it paid off. This article lays out how a small business should judge whether an AI build actually pays: baseline first, separate leading from lagging indicators, count hard and soft returns honestly, run a simple payback calculation, and refuse to be fooled by vanity metrics.
Be honest: AI returns are uncertain and often slow
Start from reality, not the sales deck. The Reserve Bank of Australia surveyed 105 medium-to-large firms in November 2025 and reported that returns on AI investment "have been mixed to date and they expect the returns will take time to be realised." The RBA also found adoption among smaller firms lags larger ones, and that uptake so far has been "relatively piecemeal," often employee-led rather than employer-led. Piecemeal adoption is hard to measure because no one owns the outcome.
The failure rate is not a footnote either. S&P Global Market Intelligence found the share of companies abandoning most of their AI initiatives jumped to 42% in early 2025, up from 17% a year earlier, as reported by CIO Dive, with the average organisation scrapping 46% of proof-of-concepts before production. We are not repeating those numbers to scare anyone off AI. We are making a narrower point: if nearly half of attempts stall, your money should follow evidence, not enthusiasm. Measurement is the discipline that tells the difference between the projects worth scaling and the ones worth killing.
You cannot measure improvement without a baseline
Here is the rule we apply before any build: if you have not measured the current state, you cannot prove the new state is better. A baseline is a snapshot of how a process performs today, in numbers, before AI touches it.
For a single workflow, capture three things. How long it takes (minutes per task, hours per week). What it costs (labour at a loaded hourly rate, plus any tools). And how well it performs (error rate, turnaround time, conversion, customer wait). Write those numbers down with a date. That record is the business case, and it is the thing you will compare against in ninety days.
Skipping the baseline is the most common and most expensive mistake we see. Without it, every claim afterwards is a guess dressed as a result. "It feels faster" is not a number. We treat baselining as the opening move of the discovery and audit phase, because the audit that maps where time and money actually go is the same audit that produces your before-figures. Measure twice, build once.
A baseline also protects you from over-claiming later. When the figure is on paper, no one can quietly inflate the win after the fact.
Leading indicators versus lagging indicators
Returns on an AI build arrive on a delay, so you need two kinds of signal.
Lagging indicators are the outcomes you ultimately care about: hours saved per month, cost avoided, revenue influenced, profit. They are the truth, but they show up late, sometimes a quarter or two after launch. If you wait only for these, you fly blind for months.
Leading indicators are the early signs that the lagging result is coming: adoption by staff, task completion rate, error rate trending down, time-per-task falling. They move within weeks and let you course-correct before you have spent the full budget.
A worked rhythm: in the first month, watch leading indicators to confirm the system is being used and is accurate. By month three, check whether the lagging numbers (hours, cost, revenue) have actually moved against the baseline. If the leading indicators are strong but the lagging ones never follow, that is your early warning that the build is producing activity, not value. That distinction is the heart of the next section.
Hard returns versus soft returns
Value from an AI build comes in two grades. Be rigorous about the first and honest about the second.
Hard returns convert to dollars without much argument:
- Hours saved. Time-per-task before minus after, multiplied by volume and a loaded hourly rate.
- Cost avoided. Headcount you did not need to add, software you retired, error correction you no longer pay for.
- Revenue influenced. Faster lead response, more quotes out the door, fewer abandoned carts. Influenced, not "caused", which matters for attribution below.
Soft returns are real but resist a clean dollar figure:
- Quality and consistency. The same correct answer every time, which an internal AI knowledge assistant delivers where a busy human varies.
- Capacity freed. Owner attention redirected from admin to selling and strategy.
- Resilience. Knowledge that no longer walks out the door when one person leaves.
Do not pretend soft returns are zero, and do not pretend they are bankable cash. We list them separately in any business case, quantified where we honestly can, described where we cannot. An owner who hides soft value undersells the project; an owner who dresses it up as hard cash misleads themselves.
A simple AI ROI calculation
You do not need a consultant's spreadsheet. You need one formula and disciplined inputs.
ROI (%) = (Annual value gained minus Annual cost) / Annual cost x 100
Payback period (months) = Total upfront cost / Monthly value gained
Work it with a small, deliberately modest example. Say a process consumes 40 hours a week of admin at a loaded rate of $45 an hour. That is $93,600 a year. An AI automation removes 60% of it: $56,160 of value a year. The build costs $30,000 upfront plus $400 a month in running costs, so $34,800 in year one.
- Year-one ROI: ($56,160 - $34,800) / $34,800 = 61%.
- Payback: $30,000 / ($4,680 monthly value - $400 running) = about 7 months.
Three rules keep the calculation honest. Use a loaded hourly rate (wage plus on-costs), not the bare wage. Include every running cost: model usage, maintenance, oversight. And discount the hours saved if the freed time is not actually redeployed into useful work; an hour "saved" that becomes an hour of scrolling is not a return. Run the same formula with a conservative case and an optimistic case, and make the decision on the conservative one.
Why outcome metrics beat vanity metrics
This is the line that separates measurement from theatre. MIT Sloan Management Review argues that technical model metrics such as precision, recall and lift give "no direct reading on the absolute business value" of an AI system, and that "the focus should be on business metrics, such as revenue, profit, savings, and number of customers acquired."
Translate that for a small business. Logins, messages sent, prompts run, documents processed and "engagement" are vanity metrics. They tell you the tool is busy. They do not tell you the business is better off. A chatbot can answer ten thousand questions and still deflect nothing if customers re-contact you anyway. Tie every metric back to a baseline outcome: tickets actually resolved without a human, hours genuinely returned, quotes that turned into revenue. If a number cannot be traced to money, time or a quality outcome a customer would notice, treat it as diagnostic, not as proof of return.
Process change is usually what converts a tool into a result. Buying AI and bolting it onto a broken workflow rarely moves the lagging numbers; redesigning the workflow around it does. That is why we plan the process and the build together rather than treating AI as a feature you switch on. Our note on planning AI builds walks through scoping the workflow first so the metrics have something real to measure.
A worked example: outcome-led measurement
Our Lead Management System build shows the discipline in practice. The baseline was painful and specific: leads handled manually across disconnected tools, with reporting cobbled together by hand. The system we built saved more than 1,500 hours per month and auto-generates the reports that used to eat days.
Notice how those figures behave. "1,500+ hours per month" is a hard return measured against a real before-state, the kind of number you can multiply by a loaded rate to get cost avoided. "Auto-generated reports" is partly hard (hours no longer spent compiling) and partly soft (managers see current numbers instead of stale ones, so decisions improve). We report both, kept separate. And we resisted the vanity version of the story, which would have boasted about messages processed or workflows triggered. Volume was never the point. Hours returned and reporting that runs itself were the point, because those are the outcomes the owner could feel and bank.
Attribution, over-claiming, and the limits of measurement
Two honest cautions, because credibility is the only currency that matters here.
First, attribution. If revenue rises after an AI build, the AI rarely deserves all the credit. A new salesperson, a seasonal lift or a price change may share it. This is why we use the word "influenced" for revenue rather than "caused", and why a baseline plus a sensible comparison window beats a triumphant before-and-after screenshot. Where you can, change one thing at a time.
Second, over-claiming. Some value is genuinely indirect and slow, exactly as the RBA found. Resist the urge to convert every soft benefit into a dollar figure to make a slide look better. A measured 61% return you can defend is worth more than a fictional 300% no one believes. The point of measurement is not to win an argument; it is to decide, with eyes open, whether to scale, adjust or stop.
The Enki approach: a disciplined Evaluate step
We treat measurement as a named stage, not an afterthought. The NIST AI Risk Management Framework defines a Measure function that "employs quantitative, qualitative, or mixed-method tools, techniques, and methodologies to analyze, assess, benchmark, and monitor AI risk and related impacts." We borrow that posture and apply it to value, not just risk: baseline before we build, instrument the system to capture leading and lagging indicators, then evaluate against the before-state at a set date.
This matters because the gap between adopting AI and capturing value is wide. Deloitte Access Economics found two-thirds of Australian SMBs use AI, but just 5% are fully enabled to realise its benefits, with $44 billion in potential GDP if more advanced even one rung. The businesses that move from the two-thirds to the 5% are not the ones with the cleverest models. They are the ones who measured, redesigned the process, and kept what worked.
We cannot promise a guaranteed return, and we would not trust anyone who does. What we offer is the discipline that makes returns visible and defensible: a baseline, a small honest ROI calculation, outcome metrics over vanity ones, and a clear-eyed evaluate step that is willing to say a build is not paying off. Done that way, an AI project stops being a leap of faith and becomes a decision you can stand behind.