MisLinux: How I Manage AI Agents Like a New Hire (Detailed Briefs, Reviewed Output, CIR Notes)

By Vicente Arteaga Gomez

MisLinux

This is part of my Kubernetes-on-Hetzner-and-operations series on MisLinux. It describes how I actually *manage* AI coding agents day to day — not the tools themselves, but the operating model that keeps them useful in production work.

Most teams experiment with AI in one of two broken modes:

Magic button — "fix it" with no context, hope for the best.
Copilot as autocomplete — fast typing, no durable output.

The model that works for business and operations automation is a third one:

> Treat the agent like a new hire who is fast, literal, and forgetful — brilliant at execution when the brief is clear, dangerous when the brief is vague, and unable to remember yesterday unless you wrote it down.

The contract I use with every agent session

New-hire parallel	What I ask the AI to do
Onboarding packet	Point it at `AGENTS.md`, relevant `SKILL.md`, and proof artifacts from the last run
Written task brief	State goal, constraints, what must not change, and how success is verified
Deliverable, not chat	Require a script/program/test/CIR entry — not "here is what you could run"
Code review	Read the diff myself; rerun dry-run/smoke tests before any production path
Manager notes	Append Context / Intent / Rationale when behavior or tooling changes

If the session ends with only prose, I failed the brief.

Step 1: Route manual work through the agent

When I catch myself doing something twice — spreadsheet reconciliation, onboarding checks, config diff, report export — I stop and rephrase:

> "Do this for me, but implement it as a repeatable script with tests and a CIR entry. Dry-run first."

That single sentence forces three outputs:

Automation — PHP/Python/ shell the agent can rerun without the conversation
Guardrails — PHPUnit or smoke tests on the invariant logic
Memory — AGENTS.md bullet explaining *why* the approach exists

The agent is the implementer. I remain the approver.

Step 2: Write briefs the way you would for a junior engineer

Vague briefs produce confident wrong code. I include:

Scope boundary — read-only vs production mutation; which namespace/network/sheet
Inputs — file paths, env vars, credentials *location* (never the secret itself)
Expected output — JSON shape, exit codes, artifact directory layout
Verification — exact command I will run to prove it worked
Discarded options — "do not patch production CronJob inline; fix the generator"

Example weak brief:

> Clean up the onboarding mess.

Example strong brief:

> Add onboard.php --plan read-only mode for YAML manifests. Must label EXTERNAL steps, emit JSON for monitoring, PHPUnit for the diff engine, CIR in operations/adsystem/AGENTS.md, and save proof under history/<timestamp>/. No live DB writes in this slice.

The second brief is longer because ambiguity is expensive.

Step 3: Review output like a manager, not a spectator

I assume the first pass is wrong somewhere. My review checklist:

[ ] Did it touch only the files the brief allowed?
[ ] Does dry-run default to safe?
[ ] Are tests asserting behavior, not implementation trivia?
[ ] Does failure output say expected vs actual?
[ ] Is there a CIR entry with Rationale (not just Context)?
[ ] Would another operator know how to rerun this in six months?

When something is off, I nudge, not restart:

> "Keep the planner, but EXTERNAL must not count as WRONG. Add a test. Update CIR with why."

That preserves context already loaded in the session.

Step 4: Demand CIR annotations for every non-obvious choice

Models forget. Repositories should not.

Good CIR captures why, not what:

- Context: onboarding checks were duplicated in chat and cron.
  Intent: one manifest drives plan, apply, and readiness monitoring.
  Rationale: without a single contract, AI regenerated slightly different
  checks each session; manifest + `--plan` makes drift visible before apply.

I explicitly ask:

> "Add a CIR entry to the nearest AGENTS.md explaining what you tried, what failed, and what would re-break if reverted."

Without that, the next agent (or me in three weeks) "cleans up" the guardrail that prevented an outage.

Step 5: Reduce manual touchpoints every iteration

Each manual step is randomness:

Wrong network selected
Stale token
Spreadsheet column renamed overnight
"I thought we already ran that"

My metric: count human decisions per run. If a process still needs ten, the next session removes two — with tests.

Automation does not mean zero humans. It means humans approve gates, not re-type data.

What this looks like in practice

Recent classes of work where the new-hire model helped:

Chore	Agent built	I approved
Publisher onboarding diff	YAML manifest + `--plan` library	Read-only proof JSON before any apply
Control panel ad-request filter	Second-pass PHP + PHPUnit	One-off kubectl job logs
Blogger publish recovery	CDP publish helper fixes + crawl gate	Public HTML verification
Finance month fill	Mapping libraries + failure email contract	Dry-run Job in cluster

In every case, the valuable artifact was not the chat. It was the script + test + CIR triad.

Anti-patterns I stop early

Anti-pattern	Why it fails
"Just do it live"	No replay, no audit trail
Accepting code without tests	Regresses silently on next model pass
Letting AI edit production YAML by hand	Drifts from generator source of truth
Skipping CIR "because it is obvious"	Obvious fades in two weeks
Treating refusal as failure	Good agent should block unsafe mutations

Sibling posts on this blog

Together they describe one strategy: delegate execution to the agent, keep judgment and memory in the repo.

---

Independence note: AI tools mentioned (Copilot, Claude, Codex, Cursor) reflect my stack. No vendor sponsorship implied.