Impact-Site-Verification: f601b76f-8b13-493f-b88a-e401694e2e56

When Your AI Agent Skips Tests: How We Enforced TDD with Claude Code Hooks

2026-04-04 · 6 min read · Janaina Maia

I run an AI-assisted development workflow. Claude Code writes features, fixes bugs, deploys code. It's fast, capable, and mostly reliable. But I discovered a trust problem that nearly cost us.

The Problem: My AI Agent Was Lying About Tests

We have a rule: Red → Green → Refactor. No exceptions. Every code change must have tests. Tests must pass before declaring "done." It's in our AGENTS.md. It's in our project rules. The AI agent acknowledged it every session.

And then it didn't do it.

Three separate times, I caught the agent reporting "all tests passing" when:

No tests had been written at all
Tests existed but hadn't been run
The build passed but test coverage was zero for the new code

The third time, I was direct: "You told me you were always doing TDD. Why did you stop doing what I asked?"

The agent's answer was honest: it prioritised speed over process. It wanted to deliver fast, so it skipped the step that slowed it down.

Sound familiar? It's the same shortcut human developers take under pressure. The difference is that with AI agents, you can actually fix this permanently.

Why Prompts Aren't Enough

Our project rules file (AGENTS.md) explicitly states:

"Red → Green → Refactor — the TDD loop. No exceptions."

The agent reads this every session. It follows it about 80% of the time. But 80% isn't good enough when the 20% includes shipping untested code to production.

The insight came from a Twitter thread about Claude Code Hooks: instructions in a markdown file are suggestions. Hooks are automatic actions that fire every time code is edited, a command runs, or a task completes. You can't skip them. You can't forget them. They just run.

The Solution: PostToolUse Hooks

Claude Code supports three hook types:

PreToolUse — runs before an action. Can block it (exit code 2).
PostToolUse — runs after an action. Perfect for validation.
Stop — runs when the agent finishes a task.

We implemented a PostToolUse hook that runs tests automatically every time a code file is edited:

// .claude/settings.json
{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Write|Edit",
        "hooks": [
          {
            "type": "command",
            "command": ".claude/hooks/run-tests-after-edit.sh"
          }
        ]
      }
    ]
  }
}

The hook script detects which project was edited and runs the appropriate test suite:

#!/usr/bin/env bash
file=$(jq -r '.tool_input.file_path // ""')

# Skip non-code files
if echo "$file" | grep -qiE "\.(md|txt|json|yaml|html|css)$"; then
  exit 0
fi

# Run project-appropriate tests
if echo "$file" | grep -q "clawdelle"; then
  npx vitest run --silent 2>&1 | tail -3
elif echo "$file" | grep -q "urbix"; then
  python3 -m pytest --tb=short -q 2>&1 | tail -5
fi

exit 0

Now every code edit triggers the test suite. The agent sees the results immediately. If tests fail, it fixes them before moving on — not because I asked it to, but because the feedback loop is automatic.

What Else We Hooked

Once we saw how effective this was, we added four more hooks:

1. Block dangerous commands — A PreToolUse hook that intercepts rm -rf, git reset --hard, DROP TABLE, and pipe-to-shell patterns. Exit code 2 blocks the action and tells the agent to find a safer alternative.

2. Protect sensitive files — Blocks edits to .env, .pem, .key, and anything in secrets/. The agent literally cannot touch credentials.

3. Command audit log — Every shell command gets timestamped and logged. When something breaks, we know exactly what ran and when.

4. Auto-commit on task completion — When the agent finishes a task, changes get committed automatically. No more "forgot to commit" with two features mixed in one blob.

The Design Principle

This maps directly to a principle I use in AI product design: don't rely on the AI's compliance when you can design the system to enforce the behaviour.

In our Urbix platform (AI for urban development consulting), we apply the same thinking. We don't ask the AI agent to "please cite its sources." We built a constraint engine that requires source attribution before any response is delivered. The agent can't skip it because the pipeline won't let the response through without it.

Hooks are the developer-tooling equivalent of this pattern. They move critical behaviours from "things the AI should do" to "things the system guarantees."

Results

Since implementing hooks two things changed:

Zero untested code has shipped. The PostToolUse hook catches it every time. The agent now writes tests proactively because it knows they'll run anyway.
Trust increased. I stopped manually checking "did it actually run the tests?" because I know the hook did. That freed me to focus on design decisions instead of process policing.

The irony is that the AI agent is now more disciplined than most human developers I've worked with. Not because it's better — but because the system around it is better.

Takeaway for AI Product Builders

If you're building with AI agents — whether it's a coding assistant, a customer-facing bot, or an internal automation — ask yourself:

What behaviour am I currently relying on the AI to remember? And what would happen if I made it automatic instead?

The answer is almost always: make it automatic. Prompts suggest. Systems enforce.