Pages

Monday, June 1, 2026

Why Live Follow-Up Files Beat Forgetting the Last Incident

By Vicente Arteaga Gomez

MisLinux · Last updated: May 5, 2026

This is part of my Kubernetes-on-Hetzner-and-operations series on MisLinux. It reflects the follow-up system I now prefer for ongoing operational questions.

Follow-up files cover image

The incident is not really over just because the page stopped firing.

Some of the most important work happens after the immediate pressure is gone:

  • verify whether the fix actually held
  • confirm second-order metrics recovered
  • watch for regressions a few days later
  • record which questions are still open

If I do not keep those as live follow-up items, they turn into vague intentions that disappear under the next urgent task.

What I want a follow-up file to contain

SectionPurpose
Original issuewhy this item exists
Analysiswhat the investigation actually found
Actions takenwhat changed already
Future checkswhat still needs to be watched
Artifact linkswhere the evidence lives

That is enough to turn a lingering concern into an inspectable queue item.

Why one markdown file is often enough

I do not need a huge tracker to get value here.

One file per follow-up works because it is:

  • easy to diff
  • easy to link from a registry page
  • easy to archive later
  • easy to extend when the story changes

What matters is that the follow-up is durable and evidence-backed, not that it looks like an enterprise ticket system.

The real advantage

The biggest benefit is not organization. It is time.

When the same topic resurfaces, I want to answer these questions quickly:

  • what exactly happened last time?
  • what did we change already?
  • what metric were we supposed to keep watching?
  • which artifact folder contains the supporting data?

That is much easier with a live follow-up file than with memory plus old shell output.

Failure case: the unresolved item that looks resolved

One of the most common mistakes in operational work is treating "the immediate symptom improved" as equivalent to "the issue is closed."

That hides important follow-up questions like:

  • did total demand recover or only one slice?
  • did the workaround create a new imbalance?
  • did a monitoring threshold now become misleading?

Those are exactly the questions I want to keep alive explicitly.

What I'd do differently now

I used to write big postmortems and almost no living follow-up notes. What I'd do differently now is create the follow-up file as soon as I know the issue needs days or weeks of observation, even if the initial analysis is not perfect yet.