Pages

Friday, May 8, 2026

Why I Log Context Before I Touch Anything Live

By Vicente Arteaga Gomez

MisLinux · Last updated: May 5, 2026

This is part of my Kubernetes-on-Hetzner-and-operations series on MisLinux. It is not a vendor tutorial. It is a working rule I adopted after too many live investigations started with a terminal and ended with fuzzy memory.

Context logging cover image

The mistake is easy to make:

  1. something looks wrong
  2. I jump into the shell
  3. I start reading logs, manifests, dashboards, or browser state
  4. two hours later I remember the fix, but not the full reasoning path

That is exactly how the same class of incident keeps returning in slightly different forms.

What I write down before I start

I do not try to write a novel. I write three things:

FieldWhat I want
ContextWhat is happening right now, and what changed recently
IntentWhat I am trying to prove or change
RationaleWhy this path is safer than the alternatives

That is the smallest version of a change log that still survives a bad week.

Why I do this even for read-only work

People often treat documentation as a mutation-only habit. I think that is backwards.

Read-only investigations still shape later decisions:

  • which graphs I trusted
  • which logs were misleading
  • which browser path was blocked by auth or anti-bot checks
  • which saved artifact became the source of truth later

If I do not log that context while the investigation is fresh, the next run starts from folklore.

The exact failure pattern I am trying to avoid

The failure is not "I forgot everything." It is more subtle:

  • I remember the conclusion
  • I forget the rejected paths
  • I forget the external constraint
  • I forget which proof actually convinced me

That is dangerous because a future me can look at the final state and decide some "unnecessary" workaround should be removed, when in reality it was the part keeping the system safe.

My practical template

This is the kind of note I want near the work:

Context: public traffic looked down on one origin, but the service was partially balanced elsewhere.
Intent: verify whether this is a real global drop or only a per-origin distribution change.
Rationale: checking the summed public path first avoids "fixing" a healthy balance event as if it were an outage.

That template works for code, dashboards, browser automation, one-off reports, and production runbooks.

A small diagram of the decision path

Context logging flow diagram

The point of that flow is simple: I want the note *before* I accumulate terminal tabs, ad hoc commands, and screenshots I will not be able to explain later.

The command trail I usually keep

I do not dump every command. I keep the commands that define the reasoning path:

# 1. Confirm the public symptom
curl -I https://example-service.invalid/health

# 2. Check whether the issue is global or isolated
kubectl get pods -n example-namespace -o wide

# 3. Preserve the before-state artifact
kubectl get deployment example -n example-namespace -o yaml > before.yaml

# 4. Check the data source behind the alert
python3 inspect-metric-source.py --metric service_request_rate

This is enough to reconstruct the investigation later without pretending the shell history is documentation.

Failure case: when I skip the note

When I skip the note, the same bad pattern appears:

  • the artifact folder contains files but not their meaning
  • the dashboard screenshot exists but not why it mattered
  • the final fix is remembered as "obvious"
  • the next operator repeats the discarded approach first

That is how harmless-looking cleanup becomes accidental regression.

What this changes when AI agents are involved

AI agents amplify the value of explicit context because they can move faster than humans through the same investigation surface.

That is useful only if the task is bounded clearly enough that the agent is not inventing the contract while it works.

Without a context note:

  • the agent may optimize the wrong thing
  • a saved artifact may be treated as authoritative when it was only a probe
  • a fallback path may be removed because it looks redundant

With a context note, the agent is executing inside a known frame.

A simple operator checklist

Before touching a live-adjacent system, I now want:

  • one sentence for the symptom
  • one sentence for the intended outcome
  • one sentence for why this path is the least risky
  • one artifact that captures the before-state

That is a very low bar, but it removes a surprising amount of confusion.

What I'd do differently now

I used to think I needed "full documentation" before any of this would help. I no longer believe that.

What I would do differently now is start with a tiny Context / Intent / Rationale note first, then let the richer artifacts accumulate around it. Small, explicit notes beat larger undocumented piles of output every time.