Guides & Sensors: The Two Halves of an AI Coding Harness

Guides shape the agent before it writes a line. Sensors catch what slips through after. Get both right and AI-generated code stops being something you babysit and starts being something you ship.

In the previous article we defined the harness as everything around an AI coding agent except the model, made of two control types: guides (feedforward) and sensors (feedback). This piece is the practical one — what each actually looks like in a real repo, and how to build them so they compound instead of becoming busywork.

Guides: steer before a line is written

A guide is anything that constrains the space of outputs before generation. The principle is simple: every decision you encode as a guide is a decision the agent doesn't get wrong. Humans do this implicitly — a senior engineer "just knows" the house style. Guides make that knowledge explicit and machine-readable.

1. Project conventions, checked into the repo

The single highest-leverage guide is a living conventions file (a CLAUDE.md, AGENTS.md, or similar) at the repo root. Not a wiki nobody reads — a file the agent loads every session. It should answer: how do we name things, where does code live, what patterns do we use, what do we never do. We keep ours short and ruthlessly current; a stale guide is worse than none because it confidently misleads.

2. Architecture documents

Agents are excellent at local correctness and poor at global intent. An architecture doc — even a one-page "here are the layers and how they talk" — prevents the most expensive class of mistake: code that works but violates the system's shape. This is feedforward at the structural level.

3. Scaffolding & bootstrap scripts

If creating a new module, endpoint, or component has a "right shape," encode that shape as a generator the agent runs rather than prose it has to interpret. A bootstrap script is a guide with teeth: it doesn't describe the convention, it produces it.

Rule of thumb

If you find yourself correcting the same thing in review twice, it's not a review comment — it's a missing guide. Promote it into the harness and you never correct it by hand again.

Sensors: catch what slips through

No guide is perfect, so the second half is observation. A sensor inspects generated code and either blocks it or feeds the problem back so the agent can self-correct. The magic of sensors with agents is the closed loop: a human who gets a failing test fixes it; an agent wired to the same signal does too, automatically, before you ever see the diff.

Computational sensors (fast, deterministic)

Linters & formatters — style and obvious smells. Cheap, run on every save.
Type checkers — a whole category of bugs eliminated before runtime. Strongly typed projects give agents a tighter, more correctable harness.
Test suites — the highest-signal sensor you have. An agent with a fast test suite can verify its own work in a loop.
Structural / architecture fitness checks — "no import from this layer," "bundle under N kb," "p95 under 200ms." Make the architecture self-enforcing.

Inferential sensors (semantic, slower)

Some problems no linter catches: a function named for what it used to do, a subtle off-by-one in business logic, a missing edge case. Here an AI code-review agent earns its cost — semantic judgment a deterministic tool can't provide. The trick is to use it where semantics matter and not burn inference cycles on things a type checker already guarantees.

The art isn't adding more checks. It's putting the cheapest sufficient check at each point — computational where you can, inferential only where you must.

Wiring it together: the loop

Guides and sensors aren't a checklist, they're a loop. Generation happens inside the guides; sensors run on the output; failures feed back; the agent corrects; sensors run again. Only when the computational and inferential sensors are green does a human look — and they look at the one thing the harness can't judge: is this the right thing?

That's the payoff. The harness handles maintainability and architecture on autopilot, so your scarce, expensive human attention lands on behaviour and intent — the work that actually needs a person.

Start small, grow the harness

You don't build this all at once. Start with the two highest-leverage pieces — a real conventions file (guide) and a fast test suite in CI (sensor) — and let the harness grow every time review catches something a control could have. Over months it compounds into a system where shipping AI-generated code feels less like supervising and more like delegating.

Next: why timing matters — keeping quality left across the whole development lifecycle.

FeedforwardFeedbackLintersTestingAI Code Review

Want a harness that actually holds?

We design the guides and sensors that let teams ship AI-assisted code with confidence.

Talk to Nilerobot →