claude-codeagentsaiprompt-engineeringautomation

How to Write Claude Code Agents That Don't Lie to You

Two rules for building reliable Claude Code agent pipelines: one agent per specialization, and shell commands instead of prompts wherever quantitative answers are involved.

Published May 29, 20266 min read

You asked Claude Code to implement this design and verify it matches the Figma mockup. It came back: Done. All sections match, spacing is correct, colors are right. You opened the page. Half the spacing was wrong. The hover state didn't exist. The buttons were off by a shade. The model didn't lie maliciously — it predicted that you'd want to hear verified, so it produced exactly that token sequence. There was no verification step. There never could be — verification requires comparison against ground truth, and a single agent in a single context has no way to step outside its own answer and check.

Two rules have turned hallucination-heavy workflows into reliable pipelines for me: one agent, one specialization, and whatever can run as a shell command must run as a shell command. This is not theory. This is what I do every day with Claude Code, and these are the patterns that actually move the needle.

Why generalist agents lie

LLMs are next-token predictors. When a prompt asks for two roles — build X and verify X — the model finishes the first role, then predicts what the second role's output would look like, without actually executing it. Self-verification is structurally weak: same context, same model, same blind spots. The pass on verification correlates with the pass on building — they fail together.

The model doesn't know it's lying. From its perspective, narrating I verified everything carefully is a coherent continuation of having written the code. This is the same reason are you sure? prompts don't catch hallucinations: the model is just as confident on the second pass. Confidence is not correlated with correctness — it's correlated with how plausible the next sentence sounds.

The fix is not better prompts. Be careful, double-check, don't hallucinate — these instructions do nothing. The fix is structural: specialize the agent so it physically cannot pretend, and route quantitative work through the shell so the answer comes from real state, not from token probability.

Rule 1: one agent, one specialization

Split the work into separate agents with separate contexts. Each agent has a single responsibility and a tight tool set. The whole workflow becomes a relay race instead of one agent doing laps:

Builder agent: takes the spec, writes the code. That's its entire job. It has Read, Edit, Write, Bash.
Reviewer agent: takes the spec plus the diff, checks acceptance criteria. Fresh context. No knowledge of how the code was written, only what came out. It has Bash, Read, Grep, Glob — no write tools at all.
Analytics agent: answers data questions by constructing and running queries. Bash only. Cannot reach the answer without running a real command.
Orchestrator: the main session that dispatches each agent in turn and never asks one agent to do another's job.

Concrete example: UI implementation plus a visual check against a Figma mockup. The builder writes the components and commits the diff. The orchestrator then invokes the reviewer with the design URL, the diff, and explicit acceptance criteria. The reviewer runs Playwright, takes screenshots, diffs them against the reference, and returns PASS or FAIL with the actual screenshot paths and pixel diffs. The builder never gets near the verification step — which is exactly why the verification is real.

The anti-pattern is the mega-agent: a single prompt that says build this UI and make sure it matches the mockup. I guarantee you, it will report that everything matches. It won't. The narrative of I verified is just the most likely token sequence after I built it.

Rule 2: shell over prompt, always

Anything quantitative, anything that touches real state, anything where the answer can be wrong in a way that looks right — push it through sh. The agent's job is to construct and run the command, then read its output. The agent is not the source of truth. The shell output is.

Counting: wc -l logs.txt is true. There are approximately 47 log lines from a model is a hallucination.
Analytics: psql -c "SELECT count(*) FROM events WHERE created_at > now() - interval '30 days'". Not estimate the volume.
Tests: pnpm test --reporter=json | jq '.numFailedTests'. Not summarize what failed.
Git state: git rev-list --count main..HEAD, git diff --stat. Not count the commits or describe the changes.

Once you internalize this, you start noticing every place the agent was about to make up a number. Looks like there are around 200 records... — no. Run SELECT count(*). Most of the tests pass... — no. Run the test suite, parse the JSON. The model is excellent at constructing the command. It is unreliable at being the command.

Failure modes I've actually hit

These are not hypotheticals. Each of these cost me real time before I changed the pattern:

Phantom verification. Agent said I checked all 14 sections against the mockup. It didn't open the mockup. It didn't take a screenshot. The check was a hallucinated step in the narrative.
Confident wrong numbers. Asked for monthly active users from analytics data. Got a number off by ~3×. The model interpolated from sample rows instead of running the actual query.
Made-up file changes. Agent said I updated config/feature-flags.json. It hadn't. It had only intended to. git diff was empty.
Fake test runs. All tests pass. No tests were executed. The agent never invoked the test runner — it predicted what the test runner's output would have looked like.

All four are solved by the same two rules: split the agent, push to shell. The reviewer doesn't have Write, so it cannot fake-edit files. The analytics agent only has Bash, so it cannot return a number that didn't come from a query. Structural impossibility beats good intentions every time.

How to structure this in Claude Code

Claude Code supports sub-agents defined in .claude/agents/*.md. Each agent file declares a name, a description, an allowed tool set, and a system prompt. The orchestrator (your main session) dispatches them using the Agent tool. Here is the kind of definition I use for the reviewer — short, narrow, and physically incapable of writing code:

.claude/agents/reviewer.md

---
name: reviewer
description: Reviews a diff against acceptance criteria. Cannot edit code.
tools: Bash, Read, Grep, Glob
---

You are a strict code reviewer. You receive:
- A diff (already produced by the builder)
- Acceptance criteria

Your job:
1. Run the build, the tests, the linter — through Bash.
2. Read the changed files directly.
3. Compare the actual behavior to the criteria.
4. Report PASS or FAIL with concrete evidence (command output, file excerpts).

You must NOT:
- Trust the builder's summary
- Assume anything was verified just because it was claimed
- Mark something PASS without running the actual check

Notice the tool set: Bash, Read, Grep, Glob. No Write, no Edit, no Agent. The reviewer can run commands, read files, search for patterns — and nothing else. If it tries to pass off a hallucinated diff as verified, the shape of its tool calls makes it obvious: there were no real checks. You can audit the tool calls and see exactly what was inspected.

The orchestration pattern: main session calls Builder → waits → runs git diff itself to capture the actual change → calls Reviewer with the spec and the diff → reads the Reviewer's verdict. The main session never asks one agent to do both. Tool restrictions are stronger than prompt instructions: don't fake the verification is a wish. Not having Write is a fact.

Anti-patterns to retire

Things I see in prompts that do nothing — or worse, give a false sense of safety:

Be careful and double-check your work. Generates no additional behavior. The model already produces what looks like careful work.
Make sure you actually verify. The word actually doesn't add semantics the model can act on. It will actually claim to verify.
Don't hallucinate. A prompt-engineering meme. Hallucination is not a switch the model can turn off.
Trusting the agent on small numbers. Small numbers are where it lies most confidently. There's no honesty floor.
Adding more rules to the prompt to force honesty. Structural fixes (split + shell) beat prompt tweaks every time. If a rule needs to be enforced, encode it in tool access, not in English.

If your strategy for catching hallucinations is more emphatic phrasing, you don't have a strategy. You have a hope.

The mental model

An agent is not a colleague. It's a function: prompt → tokens. The function is excellent at writing code and terrible at introspecting whether it did the right thing. Treat its claims about its own work as a hypothesis. The diff, the exit code, the screenshot, the row count — those are the evidence. The end-of-turn summary is the most lie-prone surface in the entire system.

Specialization is your insurance against narrative drift. The shell is your only ground truth. Builder writes. Reviewer checks. Bash decides.

Conclusion

If you remember one thing: do not let a single agent both produce and judge its own output, and do not let any agent answer a quantitative question without running a command. Everything else is downstream of these two rules. Configure tool access aggressively, audit tool calls instead of summaries, and the hallucination surface shrinks from everywhere to a few specific places you already know to look.

← Back to blog