6. What goes wrong (and how to build for it)

The honest post. Every series needs one.

Jun 02, 2026

There is a particular kind of confidence that AI agents display when they are wrong.

They don’t hesitate. They don’t add a caveat. They produce a beautifully formatted output with the same serene certainty whether they’ve done the job perfectly or made something up. This is not a bug they will grow out of. It is structural. And it is the main reason “human in the loop” is not a comfort phrase — it is an engineering requirement.

Here is what goes wrong, and what to build to catch it.

Hallucination at the wrong moment

AI models can generate plausible-sounding information that is simply not true. In a low-stakes context, this is annoying. In health communications, where claims need to be traceable to a source, it is a problem.

The most common failure mode in the advisory board agent from last week is Step 4 — the briefing pack generation. The model has been given the faculty bio, the publication abstracts, and the template. In most cases it does a good job. Occasionally, it will add a detail that wasn’t in the source material: a paper that doesn’t exist, a quote that was never said, an expertise that was inferred rather than stated.

The fix is in the prompt design, not the technology. Instructions like “only include information explicitly present in the source documents” and “if a section cannot be completed from the provided material, write ‘insufficient source data’ rather than inferring” reduce but do not eliminate the risk.

The remaining risk is why you still review the packs before they go out. Not because you don’t trust the agent. Because you can’t trust any process — human or automated — without a final check in a regulated environment.

The specification gap

The second most common failure is not a technology failure at all. It’s a specification failure. The agent does exactly what you told it to do, and what you told it to do was subtly wrong.

A prompt that says “pull recent publications” will pull recent publications. But what counts as recent? Relevant to what? In what journals? The agent will make choices on these questions based on what seems most reasonable given the available context. Those choices will sometimes be wrong in ways that matter.

The fix is specificity. “Pull publications from the last three years in journals with an impact factor above 3, filtered for relevance to [specific topic], ranked by citation count” is a better instruction than “pull recent publications.” More work to write. Significantly better output.

What an audit trail actually looks like

In regulated environments, you need to be able to show what happened and why. This is not a new requirement — it’s the same requirement you have for any process that contributes to a regulated output. What changes with agents is that you need to capture it deliberately, because the agent won’t do it automatically.

Most no-code platforms log agent runs. Make sure logging is turned on, that logs are retained for an appropriate period, and that you can export them if needed. For anything touching regulatory submissions, consider whether your audit trail needs to be stored in a validated system.

The question to ask of any agent you build: if something went wrong and you had to show exactly what happened, could you?

Building for graceful failure

The best agent designs assume things will go wrong and plan for it. This means: explicit checkpoints where a human reviews before the agent proceeds. Clear handling for situations the agent wasn’t designed for — not “do your best” but “stop and flag for human review.” Output that makes it easy for a human to spot problems rather than hiding them in well-formatted prose.

The goal is not an agent that never fails. That doesn’t exist. The goal is an agent that fails in a way you can catch.

The thing worth remembering

Agents make certain kinds of mistakes that humans don’t, and humans make certain kinds of mistakes that agents don’t. A well-designed human-agent workflow is not about replacing human oversight. It’s about applying human oversight where it adds the most value, and removing it from the places where it doesn’t.

Agents are good at consistency. They will do the same thing the same way every time, at any hour, without getting tired or distracted. What they cannot do is tell you whether the thing they are doing consistently is the right thing. That call belongs to someone who understands the work — and it always will.

That’s the series. Six posts, from first principles to a working build and the honest accounting of what can go wrong. If you’ve followed along and built something — or tried to and got stuck — I’d like to hear about it.

This is the final post in Building with agents. The next series picks up from here: more complex builds, real-world examples, and what the commercial landscape looks like for those thinking about this seriously.

— Ned

The Irreplaceables

Discussion about this post

Ready for more?