In the previous article we defined the harness as everything around an AI coding agent except the model, made of two control types: guides (feedforward) and sensors (feedback). This piece is the practical one — what each actually looks like in a real repo, and how to build them so they compound instead of becoming busywork.
Guides: steer before a line is written
A guide is anything that constrains the space of outputs before generation. The principle is simple: every decision you encode as a guide is a decision the agent doesn't get wrong. Humans do this implicitly — a senior engineer "just knows" the house style. Guides make that knowledge explicit and machine-readable.
1. Project conventions, checked into the repo
The single highest-leverage guide is a living conventions file (a CLAUDE.md, AGENTS.md, or similar) at the repo root. Not a wiki nobody reads — a file the agent loads every session. It should answer: how do we name things, where does code live, what patterns do we use, what do we never do. We keep ours short and ruthlessly current; a stale guide is worse than none because it confidently misleads.
2. Architecture documents
Agents are excellent at local correctness and poor at global intent. An architecture doc — even a one-page "here are the layers and how they talk" — prevents the most expensive class of mistake: code that works but violates the system's shape. This is feedforward at the structural level.
3. Scaffolding & bootstrap scripts
If creating a new module, endpoint, or component has a "right shape," encode that shape as a generator the agent runs rather than prose it has to interpret. A bootstrap script is a guide with teeth: it doesn't describe the convention, it produces it.
If you find yourself correcting the same thing in review twice, it's not a review comment — it's a missing guide. Promote it into the harness and you never correct it by hand again.
Sensors: catch what slips through
No guide is perfect, so the second half is observation. A sensor inspects generated code and either blocks it or feeds the problem back so the agent can self-correct. The magic of sensors with agents is the closed loop: a human who gets a failing test fixes it; an agent wired to the same signal does too, automatically, before you ever see the diff.
Computational sensors (fast, deterministic)
- Linters & formatters — style and obvious smells. Cheap, run on every save.
- Type checkers — a whole category of bugs eliminated before runtime. Strongly typed projects give agents a tighter, more correctable harness.
- Test suites — the highest-signal sensor you have. An agent with a fast test suite can verify its own work in a loop.
- Structural / architecture fitness checks — "no import from this layer," "bundle under N kb," "p95 under 200ms." Make the architecture self-enforcing.
Inferential sensors (semantic, slower)
Some problems no linter catches: a function named for what it used to do, a subtle off-by-one in business logic, a missing edge case. Here an AI code-review agent earns its cost — semantic judgment a deterministic tool can't provide. The trick is to use it where semantics matter and not burn inference cycles on things a type checker already guarantees.
The art isn't adding more checks. It's putting the cheapest sufficient check at each point — computational where you can, inferential only where you must.
Wiring it together: the loop
Guides and sensors aren't a checklist, they're a loop. Generation happens inside the guides; sensors run on the output; failures feed back; the agent corrects; sensors run again. Only when the computational and inferential sensors are green does a human look — and they look at the one thing the harness can't judge: is this the right thing?
That's the payoff. The harness handles maintainability and architecture on autopilot, so your scarce, expensive human attention lands on behaviour and intent — the work that actually needs a person.
Start small, grow the harness
You don't build this all at once. Start with the two highest-leverage pieces — a real conventions file (guide) and a fast test suite in CI (sensor) — and let the harness grow every time review catches something a control could have. Over months it compounds into a system where shipping AI-generated code feels less like supervising and more like delegating.
Next: why timing matters — keeping quality left across the whole development lifecycle.
Want a harness that actually holds?
We design the guides and sensors that let teams ship AI-assisted code with confidence.
Talk to Nilerobot →