Pages

Sunday, May 17, 2026

Why Monitoring Needs Host, Hosting, and Global Views at the Same Time

By Vicente Arteaga Gomez

MisLinux · Last updated: May 5, 2026

This is part of my Kubernetes-on-Hetzner-and-operations series on MisLinux. It is based on real operational monitoring design work, not a generic observability checklist.

Monitoring scope cover image

One of the easiest ways to build a misleading dashboard is to choose only one monitoring scope.

If I watch only the global service, I can miss a saturated origin.

If I watch only the individual host, I can page on a healthy rebalance.

If I watch only the provider slice, I can miss a cross-provider traffic cliff.

That is why I now want three views at the same time:

ViewMain question
HostIs one concrete machine or origin sick, stale, or overloaded?
HostingIs one provider/location carrying the wrong share or drifting operationally?
GlobalIs the public service really up, healthy, and serving normal demand overall?

What each level catches

Host view

The host view is where I want:

  • latency
  • CPU or saturation signals
  • config freshness
  • health endpoint state
  • reload or restart anomalies

This is how I catch "one box is bad" before it becomes "the system feels random."

Hosting view

The hosting view groups hosts by provider or operational domain.

That is where I can see:

  • traffic imbalance
  • one provider falling behind on config freshness
  • one provider serving the wrong share of requests
  • one provider drifting in latency

This matters even if there is only one host in that hosting today, because the grouping is an operational concept, not just a host count.

Global view

The global view answers the question that users care about:

> Is the public service healthy overall?

That is the view where I want:

  • total public traffic
  • total error rate
  • canary health across all active public origins
  • global per-UUID or per-surface demand cliffs

Without this layer, it is too easy to confuse a redistribution event with a real outage.

The architecture I prefer

Monitoring scope diagram

The point of that diagram is that each level aggregates differently. They are not interchangeable charts with different legends.

A practical example

Imagine public traffic is split across two providers:

  • provider A carries less traffic than usual
  • provider B carries more traffic than usual
  • the total public request rate is stable

If I alert only on provider A, I page on a healthy balance shift.

If I alert only on global traffic, I miss the provider imbalance.

If I alert only on per-host CPU, I may not notice that one provider is now becoming the hidden bottleneck.

The right answer is to keep all three views and interpret them together.

Metrics I now want at each level

ScopeMetrics I care about first
Hosthealth, latency, CPU, config age, recent reload status
Hostingtotal requests, mean latency, error rate, freshness spread, host count
Globaltotal public requests, total errors, demand cliffs, public canary result

That table looks obvious after the fact, but it took me longer than it should have to stop flattening these scopes into one graph.

Failure case: the wrong dashboard story

The wrong dashboard story sounds like this:

> "Traffic is down on one origin, so the public service is down."

That sentence mixes scopes. It takes a host observation and turns it into a global conclusion.

Once I noticed that pattern, a lot of monitoring noise suddenly made more sense.

Command trail I want during validation

# Host-level health
curl -s https://origin-a.example.invalid/health

# Hosting-level request split
python3 inspect-request-split.py --group-by hosting

# Global service trend
python3 inspect-request-split.py --group-by global

The exact implementation can vary. The important part is that the queries do not collapse distinct operational questions into one series.

What I'd do differently now

I used to think multiple scopes were a nice refinement for later. I now think they are the minimum design for any public service that can move traffic across origins or providers. If I were rebuilding the monitoring from scratch, I would model host, hosting, and global identity from day one.