Adam Jones|HomeBlog

Why having a human-in-the-loop doesn't solve everything

Headshot of Adam Jones

Adam Jones

This article explains human-in-the-loop systems: what they are, what they solve, and what they don’t. In a future article we’ll explore a new intervention that can mitigate some of their drawbacks.

What is a human-in-the-loop system?

A human-in-the-loop (HITL) system is one where a human gets to make the final decision as to the action being taken. The loop here often refers to the decision-making process, for example the observe, orient, decide, act (OODA) loop. A human being in this loop means progressing through the loop relies on them, usually owning the ‘decide’ stage.

For example, imagine an AI system used for targeting airstrikes.

A human-in-the-loop system would surface recommendations to a human, who would then decide on whether to proceed. The nature of this human oversight could range dramatically: for example, reviewing each strike in detail compared to skimming batches of a thousand.

A human-out-of-the-loop system would not require human approval before taking actions. In this case the system might launch strikes automatically without anyone checking what it is doing.

What is a human-on-the-loop system?

Human-on-the-loop systems (also known as human-over-the-loop systems) take actions autonomously without requiring human approval, but can be interrupted by a human.

In our airstrike targeting system example, a human-on-the-loop system could be one which selects targets and gives humans 60 seconds to cancel the automatic strike. This system would inform humans as to what was happening and give them a genuine chance to stop the strike.

Human-on-the-loop systems are more common in fast-paced domains. For example, most of today’s autonomous vehicles are human-on-the-loop systems: they’ll continue driving autonomously, but do give humans the opportunity to stop them and take over.

Mixed systems

There is also scope for systems that use a combination of these methods, usually when looking at the AI systems that make them up at different granularity.

For example, in an autonomous vehicle might have systems that are:

  • human-out-of-the-loop: the AI models that detect where other cars are on the road based on radar data. There’s no way for the driver to override these outputs.
  • human-on-the-loop: the driving system itself, which the driver is expected to monitor and take over if something is going wrong.
  • human-in-the-loop: the navigation system that suggests a few different routes to your destination, and allows you to select one for the car to follow.

Why use a human-in-the-loop?

AI systems are already involved in making important decisions. For example, Palantir Gotham is a military decision-making platform leveraging AI, and marketing materials state ‘Gotham's targeting offering supports soldiers with an Al-powered kill chain.’

People are deploying AI systems in more places. McKinsey reports 72% of companies have adopted AI in at least one function as of early 2024, up from 55% in 2023. Additionally, organisations are using AI in more business functions than previous years,

People are trying to build even more powerful AI systems. OpenAI states they “will attempt to directly build safe and beneficial AGI”, which they define as “highly autonomous systems that outperform humans at most economically valuable work”. Anthropic “expect rapid AI progress and very large impacts from AI” which they expect “would be very disruptive, changing employment, macroeconomics, and power structures both within and between nations”.

These three factors are likely to result in AI systems making a greater number of important decisions in the future. Increased capabilities could also make deployment even faster.

However, these AI systems making important decisions are likely to make mistakes. We train most AI systems by showing them lots of examples, and having them learn patterns from this data. This has worked remarkably well for developing systems so far - however, this learning approach will almost always result in AI systems that do occasionally make mistakes.

A human-in-the-loop could catch some mistakes. A human could review the outputs of the AI model before decisions are made, and thus prevent harms caused by acting on faulty outputs.

Why might human-in-the-loop not work?

Having a human-in-the-loop is not a silver-bullet. It might be useful as one piece of a defence in depth strategy against some AI risks. Notably:

It’s hard to catch mistakes

Humans may struggle to effectively catch AI mistakes, especially as systems become more complex. AI can process vast amounts of data and make decisions based on intricate patterns that humans might not easily understand or verify. Additionally, AI errors might be subtle or occur in edge cases that aren't immediately apparent to human reviewers.

Related reading:

It’s sometimes infeasible

Some AI systems might be performing critical functions, but it’s simply not possible for them to have a human-in-the-loop because they might require quick action, or the volume of decisions might be too great to have humans review them. For example:

  • Automated emergency braking systems: detecting pedestrians and slowing a vehicle to avoid or reduce the severity of a collision. This is a safety-critical system where the decision-making speed required means a human can’t be in the loop.
  • Content moderation: AI filtering vast amounts of social media content for policy violations. The sheer volume makes comprehensive human review unfeasible.

Some problems might not have obviously ‘correct’ answers. For example, people have different values, so would come to different answers on ‘how much risk to civilians would you allow to capture a terrorist?’. Even non-value-judgements can be hard to evaluate, like ‘what government policies will lead to the most economic growth’. This makes arguing whether reasoning contains mistakes very difficult.

It doesn’t prevent all AI harms

Human-in-the-loop is a technique that can be used by actors genuinely wanting to reduce harm. However, these are not the only actors in the AI space.

For example, rogue nation states or terrorist organisations may try to use AI systems to cause harm intentionally. If these actors have control of an AI system, human-in-the-loop might actually make things worse (for example, if that human is optimising for making sure the AI system is accurately picking the most harmful options).

Even where it’s a good fit, human reviewers are likely to get complacent. AI systems are likely to get more accurate over time. This means more recommendations will be correct (yay!) - but if 99% of decisions are correct, humans will become preconditioned to approve almost everything. I think this will be one of the most likely root causes of failure in human-in-the-loop systems.1 For example, Israel’s Lavender system directed military strikes, but an operative who carried out human reviews explained “At first, we did checks to ensure that the machine didn’t get confused. But at some point we relied on the automatic system [...] I would invest 20 seconds for each target at this stage, and do dozens of them every day. I had zero added value as a human, apart from being a stamp of approval.”

How can we mitigate this?

In the next article, I cover a way to prevent reviewer complacency in human-in-the-loop systems.

Footnotes

  1. Other likely root causes include:

    • Safety culture problems where people don’t feel able to properly review or object to AI decisions, e.g. because of pressure from senior leadership to rush or approve decisions.
    • Applying human-in-the-loop where the decisions are just too hard for a human to review. This might also apply to systems where humans don’t have sufficient context or tools to review the decisions effectively.
    • Communications failures where a human intends to reject a recommendation, but the action is carried out anyways. For example, a system which recommends decisions and then whether to follow this recommendation is sent via a separate channel e.g. over radio, possibly via multiple hops (and this gets corrupted along the way). Alternatively, a system with an unintuitive UI that results in entering in the decision incorrectly. Both of these are fairly easy to avoid by building a better technical system, but I still think they’re likely to happen: for example Citibank lost $500M due to bad UI in 2020, and then managed to accidentally place an order for $444Bn in 2022 again due to bad UI.