Kubernetes on Hetzner: The Series Index
This is the index for my ongoing series on building and operating a production-grade Kubernetes cluster on Hetzner Cloud. The published articles now cover the full lifecycle: from the initial decision to choose Hetzner, through cluster bootstrapping, networking, storage, day-2 operations, cost analysis, monitoring, multi-architecture image handling, and the first real postmortems that only show up after the platform is in steady use.
Every article is based on a cluster I operate for real production workloads. The tradeoffs, costs, failure modes, and configuration choices in each post come from direct experience, not from lab exercises or vendor documentation summaries.
I am not affiliated with, sponsored by, or endorsed by Hetzner or any other vendor mentioned in this series.
If you are new to the site, this page is the best place to start. It is intentionally more useful than a date-based archive because it explains what each article covers, what order makes sense, and which posts are foundational versus optional.
The series began as a finite setup arc, but it now continues as an operations-focused reference. I update this page whenever a new published post extends the same Hetzner/Kubernetes track.
Suggested reading order
- Start with Part 1 if you are still deciding whether Hetzner is the right fit.
- Read Parts 2, 3, and 4 in order if you are building a new cluster.
- Read Part 5 before production cutover, not after.
- Read Part 6 if you need to justify the platform choice in cost terms.
- Read the later monitoring, retention, and multi-architecture posts when the cluster is already stable enough that day-2 issues matter more than first bootstrapping.
What the series is optimized for
This series is written for operators who want:
- lower infrastructure spend than a typical managed-cloud setup
- direct control over networking, upgrades, and image publishing
- practical failure modes and tradeoffs instead of idealized tutorials
- a small-team operating model where infrastructure decisions must be cost-aware
---
Part 1 — Why I Chose Hetzner for a Small Kubernetes Cluster
Published: March 18, 2026 The opening article covers the decision logic: why Hetzner's pricing and infrastructure model is attractive for small production clusters, what you give up versus a managed cloud, and who this tradeoff is actually right for. Includes a breakdown of what makes managed-cloud cost painful at small scale and why cost-per-control-unit matters more than raw feature count.
---
Part 2 — How I Bootstrap a Kubernetes Cluster on Hetzner
Published: March 2026 The bootstrapping walkthrough: node roles, k3s setup, first-run configuration decisions, and the choices that are easy to get wrong on the first attempt. Covers network assumptions, node preparation, and why I set up certain things before the cluster comes up rather than after.
---
Part 3 — Networking, Firewalls, and Flannel on Hetzner
Published: March 2026 How Flannel VXLAN works across Hetzner nodes, why the private network IP assignment matters for overlay routing, and how to set up the Hetzner Cloud Firewall rules that inter-node traffic needs. Includes the specific rules required for ICMP and UDP/8472 and why getting them wrong causes silent failures.
---
Part 4 — Storage, Ingress, and TLS for Production Services
Published: March 2026 The persistence and access layer: PersistentVolumeClaims with Hetzner volumes, nginx-ingress setup, cert-manager with Let's Encrypt, and external-dns for Cloudflare-managed DNS automation. Includes the integration between all four and the failure modes that appear when they are assembled in the wrong order.
---
Part 5 — Day-2 Operations: Backups, Monitoring, and Upgrades
Published: March 2026 What happens after the initial setup is done. Covers backup strategy and restore testing, practical monitoring that answers operational questions rather than filling dashboards, and node upgrade and replacement procedures. Includes a backup schedule reference table and a monitoring checklist.
---
Part 6 — What Kubernetes on Hetzner Costs, and When I Wouldn't Use It
Published: March 2026 A realistic cost analysis using real Hetzner pricing: compute, volumes, IPv4 addresses, backups, and the operational overhead that does not appear on the invoice. Includes a comparison with managed Kubernetes at the same node count and a decision matrix for when Hetzner makes sense versus when it does not.
---
Standalone: How I Use CIR Notes to Make AI Coding Agents More Useful
Published: March 2026 A standalone article (not part of the Hetzner series) about a documentation practice I use when working with AI coding assistants like GitHub Copilot, Claude, and OpenAI Codex. The short version: agents work better when the codebase contains explicit Context, Intent, and Rationale notes near the code they will modify. This post explains why and how. Not affiliated with Microsoft, Anthropic, or OpenAI.
---
Part 7 — Lean Monitoring for a Small Kubernetes Cluster on Hetzner
Published: April 2026 How I built a practical monitoring stack — Prometheus, Grafana, node-exporter, custom exporters — that answers real operational questions without requiring a dedicated ops team. Covers the specific dashboards and alert rules that catch the failure modes a small cluster actually encounters.
---
Part 8 — Handling Multi-Architecture Images in a Mixed-Architecture Kubernetes Cluster
Published: April 7, 2026 How to manage Docker images when your cluster has both ARM64 and AMD64 nodes. Covers multi-arch manifest lists, the docker buildx imagetools create workflow for pushing a single architecture without overwriting the other, and the specific failure modes that happen when manifest lists and garbage collection interact badly.
---
Part 9 — Why My Prometheus History Suddenly Disappeared, and What I Changed Afterward
Published: April 20, 2026 A postmortem about a silent Prometheus history-loss incident on a small cluster. Covers the interaction between time retention, TSDB size caps, WAL replay, and why a monitoring stack can look healthy while the historical evidence you rely on has already been deleted.
---
About this series
All articles in this series describe a real cluster I operate. When something went wrong before it went right, I say so. When a configuration choice has a cost, I try to quantify it. The goal is not a tutorial that produces a clean lab setup; it is a reference that survives the first real incident.
For questions or corrections: mislinuxtech@gmail.com