By Vicente Arteaga Gomez
MisLinux · Last updated: May 5, 2026
This is part of my Kubernetes-on-Hetzner-and-operations series on MisLinux. It is not a vendor tutorial. It is a working rule I adopted after too many live investigations started with a terminal and ended with fuzzy memory.
The mistake is easy to make:
- something looks wrong
- I jump into the shell
- I start reading logs, manifests, dashboards, or browser state
- two hours later I remember the fix, but not the full reasoning path
That is exactly how the same class of incident keeps returning in slightly different forms.
What I write down before I start
I do not try to write a novel. I write three things:
| Field | What I want |
|---|---|
| Context | What is happening right now, and what changed recently |
| Intent | What I am trying to prove or change |
| Rationale | Why this path is safer than the alternatives |
That is the smallest version of a change log that still survives a bad week.
Why I do this even for read-only work
People often treat documentation as a mutation-only habit. I think that is backwards.
Read-only investigations still shape later decisions:
- which graphs I trusted
- which logs were misleading
- which browser path was blocked by auth or anti-bot checks
- which saved artifact became the source of truth later
If I do not log that context while the investigation is fresh, the next run starts from folklore.
The exact failure pattern I am trying to avoid
The failure is not "I forgot everything." It is more subtle:
- I remember the conclusion
- I forget the rejected paths
- I forget the external constraint
- I forget which proof actually convinced me
That is dangerous because a future me can look at the final state and decide some "unnecessary" workaround should be removed, when in reality it was the part keeping the system safe.
My practical template
This is the kind of note I want near the work:
Context: public traffic looked down on one origin, but the service was partially balanced elsewhere.
Intent: verify whether this is a real global drop or only a per-origin distribution change.
Rationale: checking the summed public path first avoids "fixing" a healthy balance event as if it were an outage.
That template works for code, dashboards, browser automation, one-off reports, and production runbooks.
A small diagram of the decision path
The point of that flow is simple: I want the note *before* I accumulate terminal tabs, ad hoc commands, and screenshots I will not be able to explain later.
The command trail I usually keep
I do not dump every command. I keep the commands that define the reasoning path:
# 1. Confirm the public symptom
curl -I https://example-service.invalid/health
# 2. Check whether the issue is global or isolated
kubectl get pods -n example-namespace -o wide
# 3. Preserve the before-state artifact
kubectl get deployment example -n example-namespace -o yaml > before.yaml
# 4. Check the data source behind the alert
python3 inspect-metric-source.py --metric service_request_rate
This is enough to reconstruct the investigation later without pretending the shell history is documentation.
Failure case: when I skip the note
When I skip the note, the same bad pattern appears:
- the artifact folder contains files but not their meaning
- the dashboard screenshot exists but not why it mattered
- the final fix is remembered as "obvious"
- the next operator repeats the discarded approach first
That is how harmless-looking cleanup becomes accidental regression.
What this changes when AI agents are involved
AI agents amplify the value of explicit context because they can move faster than humans through the same investigation surface.
That is useful only if the task is bounded clearly enough that the agent is not inventing the contract while it works.
Without a context note:
- the agent may optimize the wrong thing
- a saved artifact may be treated as authoritative when it was only a probe
- a fallback path may be removed because it looks redundant
With a context note, the agent is executing inside a known frame.
A simple operator checklist
Before touching a live-adjacent system, I now want:
- one sentence for the symptom
- one sentence for the intended outcome
- one sentence for why this path is the least risky
- one artifact that captures the before-state
That is a very low bar, but it removes a surprising amount of confusion.
What I'd do differently now
I used to think I needed "full documentation" before any of this would help. I no longer believe that.
What I would do differently now is start with a tiny Context / Intent / Rationale note first, then let the richer artifacts accumulate around it. Small, explicit notes beat larger undocumented piles of output every time.