Most automation promises the same thing: set up the system, let it run, watch the output arrive. For low-stakes, repetitive work, that promise holds. For decisions that touch money, law, safety, or a customer's trust, removing people from the process is how organisations end up paying for a mistake at scale. Human in the loop is the design choice that keeps a person in the decision, where their judgement still counts.
This article defines human in the loop, explains where full automation predictably fails, sets out how to design oversight that works, and looks at why the Australian government now treats human oversight as recommended practice. Two recent local cases, one a A$1.872 billion failure, run through it as evidence.
What Is Human in the Loop?
Human in the loop (HITL) is a design pattern in which a person reviews, approves, corrects, or can override an automated or AI system's output before it takes effect or feeds the next step. The machine does the volume work. The person holds the judgement, the accountability, and the authority to stop a bad decision before it lands.
The term comes from control systems and machine learning, where it described keeping a human operator inside a feedback loop rather than letting the process run closed. In practice today it covers any arrangement where automation proposes and a person disposes. A model drafts a reply, a person sends it. An algorithm flags a transaction, an analyst decides. A system calculates a debt, an officer checks it before a letter goes out.
It helps to separate three arrangements, because the difference decides how much can go wrong. Human in the loop means a person is inside the decision: nothing consequential happens without their sign-off. Human on the loop means a person supervises and can intervene, but the system acts on its own unless stopped. Human out of the loop means full automation with no routine human checkpoint. The further you move from in the loop, the faster the system runs, and the more a single error compounds before anyone notices.
Why AI Needs Human Oversight
The case for human oversight rests on what current AI cannot do reliably. Large language models generate fluent, confident text whether or not the underlying facts are correct. A hallucination, the term for a model stating something false as if it were true, does not look like an error. It looks exactly like a correct answer. That is what makes it dangerous in any process without a reviewer.
The clearest recent illustration is local. In 2025, Deloitte Australia produced a 237-page report for the federal Department of Employment and Workplace Relations, reviewing an automated welfare-compliance system. The report contained fabricated academic references, false citations, and a quote attributed to a Federal Court judge that did not exist. Deloitte confirmed it had used a generative AI model, Azure OpenAI GPT-4o, and agreed to refund part of the A$290,000 fee. The errors were caught by a university researcher reading the footnotes, not by the production process. A reviewer in the loop, checking that cited sources existed, would have stopped the report before it reached a government department.
The deeper point is about accountability. An AI system cannot be held responsible for a decision. It cannot be questioned in a hearing, cannot weigh fairness against a rule, and cannot recognise when a case sits outside the pattern it was trained on. People can. Human in the loop AI keeps the responsibility with someone who can exercise it. That is not a brake on the technology. It is the condition that makes the technology safe to use on anything that matters.
We hold a plain view from our own work: the more capable a model looks, the more tempting it is to skip the check, and the more expensive the eventual miss. Fluency is not accuracy. A confident wrong answer is worse than an obvious one, because it survives a quick glance.
Where Full Automation Fails
Full automation fails most reliably in a specific zone: high-stakes decisions, applied to individual people, where the cost of a wrong answer is borne by the person and not the system. Australia has the defining case study.
The Robodebt scheme replaced manual debt calculation with the automated Online Compliance Intervention from July 2016. Wikipedia's account describes it plainly as "an automated data-matching technique with less human oversight", one that issued computer-generated debt notices using a method of income averaging that was later found unlawful. The scheme raised debts against hundreds of thousands of people. It was found unlawful, drew a A$1.872 billion Federal Court settlement approved on 11 June 2021, and a 2023 Royal Commission condemned it as a costly failure of public administration in both human and economic terms.
The phrase "less human oversight" is the whole lesson. The fault was not a single broken calculation. It was a design that removed the people who would previously have looked at a debt that made no sense and stopped the letter. Income averaging produced figures that a human officer, seeing the individual case, would have questioned. The system did not question. It posted.
Three conditions make a process a poor fit for full automation:
- Irreversible or high-cost outcomes. A wrongly sent newsletter is recoverable. A wrongly issued debt, a denied claim, or a public report with fabricated citations is not, or not cheaply.
- Edge cases and ambiguity. Automation excels at the common case and fails at the exception. Real life is full of exceptions, and they are exactly the cases where a wrong answer hurts most.
- Decisions about people. When the output affects someone's money, rights, health, or reputation, fairness and context matter, and both require judgement the system does not have.
Agentic AI, systems that chain multiple steps and take actions on their own, raises the stakes again. When a model drafts and then acts, an early error propagates through every step that follows. The faster and more autonomous the system, the more it needs a defined point where a person can see what it is about to do and say no. For a fuller treatment of the limits, see our piece on what AI cannot do.
How to Design Human in the Loop
Good human in the loop design is deliberate. Bad versions either check everything, which defeats the point of automation, or check nothing meaningful, which defeats the point of oversight. The work is deciding what to review, when, and by whom.
Start by sorting decisions by stakes and reversibility. Low-stakes, reversible, high-volume tasks can run with humans on the loop or out of it: spot-check the output, monitor for drift, intervene if quality slips. High-stakes or irreversible decisions need a human in the loop, with explicit sign-off before the action takes effect. Most processes are a mix, so the design names which decisions sit in which tier rather than applying one rule to everything.
Then decide where the checkpoint sits. There are three common positions:
- Before action (gate): the system proposes, a person approves, then it proceeds. Use this for irreversible or high-cost steps.
- During (monitor and intervene): the system runs, a person supervises a live dashboard and can pause or correct. This is human on the loop, suited to fast, high-volume flows.
- After (review and feedback): outputs are sampled and corrected after the fact, and the corrections improve the model. This is the loop in human in the loop machine learning, where human-labelled examples and reviewed outputs train the next version.
Make the human's job possible. Oversight fails when the reviewer cannot actually judge the output: too many items, too little time, no visibility into why the system decided what it did, or a default that nudges them to click approve. Effective design gives the person the reasoning behind a recommendation, surfaces the uncertain cases for closer attention, keeps the review load realistic, and logs every decision so the process can be audited later. A reviewer who rubber-stamps a thousand approvals an hour is not oversight. They are a formality.
Automation and oversight are not opposites. The aim is to automate the volume and reserve human attention for the cases where judgement changes the outcome. Done well, people spend less time on routine work and more on the decisions that genuinely need them.
The Australian Angle: Oversight as Recommended Practice
In Australia, human oversight is no longer just good sense. It is government-recommended practice. The Department of Industry, Science and Resources published a Voluntary AI Safety Standard, added to the OECD.AI Policy Navigator on 9 July 2025, setting out ten voluntary guardrails for safe and responsible AI use. Guardrail 5 reads, verbatim: "Enable human control or intervention in an AI system to achieve meaningful human oversight across the life cycle." The word "meaningful" matters. A checkbox is not oversight; the standard asks for control a person can actually exercise.
The framing aligns with the global reference. The United States National Institute of Standards and Technology released its AI Risk Management Framework on 26 January 2023, intended for voluntary use to build trustworthiness into the design, development, use, and evaluation of AI systems. It is organised around four functions: Govern, Map, Measure, and Manage. Both the Australian and the US guidance say the same thing in different words: responsible AI embeds human judgement across the whole lifecycle, not as an afterthought bolted on at the end.
Australia learned this the hard way. Robodebt and the Deloitte report are not abstract risks. They are local, recent, and expensive demonstrations of what removing or skipping human oversight produces. The guardrail exists because the failures came first. For a business adopting AI now, the standard is a gift: it names the practice that would have prevented both.
The Enki Approach
We build AI into small business operations on one rule: the more a decision matters, the more a person stays in it. That is not caution for its own sake. It is how we keep the systems we build safe to rely on.
In practice, we sort the decisions before we automate any of them. Routine, reversible, high-volume work runs with light human supervision and monitoring. Anything that touches money, a legal obligation, or a customer relationship gets a person in the loop with real sign-off, real visibility, and a logged trail. We design the checkpoint so the reviewer can genuinely judge the output, not just wave it through. The same discipline runs through how we treat internal AI knowledge tools and how we think about a business as a connected system rather than a set of parts to automate in isolation. Human in the loop is not a feature we add. It is the line between AI that helps and AI that quietly costs you.