Pages

Wednesday, March 18, 2026

How I Bootstrap a Kubernetes Cluster on Hetzner

02-how-i-bootstrap-a-kubernetes-cluster-on-hetzner

By Vicente Arteaga Gomez

MisLinux · Last updated: May 5, 2026

This article is part 2 of my MisLinux series on running Kubernetes on Hetzner. The workflow and opinions here come from my own operational experience, and I am not affiliated with or sponsored by Hetzner.

The first version of a cluster shapes almost every operational decision that follows. If the bootstrap stage is rushed, the team spends the next months compensating for avoidable mistakes. On Hetzner, that is even more important because you own more of the infrastructure detail yourself.

Bootstrap flow from repo decisions to a usable Hetzner cluster

Start with simple roles

For a small production cluster, I prefer clear node roles from the beginning. Even if the cluster is small, role clarity pays off quickly.

A basic layout can look like this:

  • one control-plane node for early testing, or three if high availability is a hard requirement
  • one or more worker nodes for application workloads
  • optional specialized worker nodes later for heavy batch jobs, storage-intensive pods, or ingress-heavy traffic

At the beginning, I optimize for recoverability and simplicity, not theoretical perfection. A simple cluster that is documented and easy to rebuild is usually safer than a complicated cluster that nobody fully understands.

Decide your node IP strategy early

One of the most important early decisions is how nodes identify themselves inside the cluster. On Hetzner, this matters because you may have both public and private addresses available. The worst outcome is mixing assumptions.

Pick a strategy and stay consistent across all nodes. If the cluster networking layer expects node-to-node communication over one address space, every node needs to follow that same pattern. In practice, many painful networking problems are not Kubernetes bugs at all. They come from inconsistent node addressing or incomplete firewall rules.

The config fragment I care about most now

This is the kind of bootstrap detail I now want written down immediately, because it shapes everything that follows:

# /etc/rancher/k3s/config.yaml
node-ip: 10.0.1.10
flannel-iface: enp7s0
node-external-ip: 167.235.x.x
tls-san:
  - 167.235.x.x

Those four lines are more important than many people expect. If the node identity is fuzzy on day one, the future debugging story will also be fuzzy.

Keep bootstrap automation explicit

I prefer cloud-init or equivalent provisioning that makes node setup reproducible. Even when the automation is simple, it should answer these questions clearly:

  • which packages are installed
  • how the container runtime is configured
  • how Kubernetes components are installed
  • which kernel or networking settings are required
  • how the node joins the cluster
  • which labels or taints apply

This makes later node replacement far less stressful. If a worker dies or needs to be recreated, recovery should feel like running a procedure, not improvising from memory.

A real bootstrap failure case I would plan around now

One of the most annoying failures I ran into later was not a Kubernetes bug at all. It was bootstrap/config drift:

  • a node advertising the wrong internal address space
  • a firewall missing the packets the overlay needed
  • cloud-init/runcmd details that looked harmless until a new worker had to join under time pressure

The practical lesson was simple: if a node cannot be rebuilt from explicit files, the bootstrap is unfinished no matter how nice the first cluster looked.

Avoid hidden complexity at day zero

At bootstrap time it is tempting to install every useful add-on immediately. I try not to do that. The first useful version of the cluster only needs a short list of essentials:

  • Kubernetes itself
  • a CNI plugin
  • ingress if public traffic is needed
  • cert-manager or another TLS automation path if the blog or app is public
  • a basic storage approach for workloads that require persistence
  • logging and monitoring soon after, but not necessarily at minute one

The goal is to earn complexity only when it solves a real problem.

Think about failure before first deploy

A cluster should be designed around failure modes, not just the happy path. Before I deploy workloads, I want clear answers to the following:

  • What happens if a node disappears?
  • What happens if a worker is recreated with a different IP?
  • What happens if ingress fails during a certificate renewal?
  • What happens if a persistent volume cannot be attached?
  • What happens if an upgrade needs rollback?

These questions sound dramatic, but they are exactly what separate a functional cluster from a production-ready one.

Minimum production safeguards

For a small but real cluster, I consider these minimum safeguards worth having early:

  • infrastructure defined in versioned files where possible
  • node configuration documented and reproducible
  • firewall rules reviewed for cluster internals and public services
  • backups planned before important stateful workloads go live
  • TLS termination handled consistently
  • a tested method for draining and replacing nodes

None of this needs to be enterprise-scale. It just needs to be intentional.

My bootstrap checklist before I call a cluster "usable"

Before I treat a fresh cluster as ready for real workloads, I want a short checklist to be true:

  • the node naming pattern is obvious
  • the join process is documented end to end
  • I can explain which IP space the CNI depends on
  • basic ingress and TLS expectations are decided, even if they are not fully deployed yet
  • I know how I would replace the first failed worker without improvising

That checklist sounds simple, but it prevents the cluster from becoming a collection of half-made decisions.

A concrete bootstrap command trail

These are the commands I want to look boring right after the first worker joins:

kubectl get nodes -o wide
kubectl get pods -A
kubectl -n kube-system get pods -o wide
kubectl -n kube-system get configmap kube-flannel-cfg -o yaml

And the output I want to see is boring in a very specific way:

NAME            STATUS   ROLES                  INTERNAL-IP   EXTERNAL-IP
master-node     Ready    control-plane,master   10.0.1.x      167.235.x.x
worker-node-0   Ready    <none>                 10.0.1.x      167.235.x.x
vpaidd-worker   Ready    <none>                 10.0.1.x      46.224.x.x

If those outputs already raise questions I cannot answer, I do not yet consider the cluster ready for meaningful workloads.

Bootstrap is where discipline begins

The biggest lesson I keep learning is that bootstrap is not only technical. It is where operational discipline starts. If naming, addressing, firewalling, and node responsibilities are loose on the first day, that looseness spreads into every future decision.

Hetzner gives enough flexibility to do things well, but that flexibility comes with responsibility. When the initial cluster is small, this is actually a benefit. It is easier to build good habits when the system still fits in your head.

What I'd do differently now

If I were starting the same cluster again, I would capture two things earlier:

  1. the exact node-IP / firewall contract in the same change that creates the first nodes
  2. the replacement procedure for the first worker before any real workload lands

Both became important later, and both are easier to write when the cluster is still small than during the first stressful rebuild.

Series note

This is part 2 of the series, so I am still intentionally staying close to first principles: node roles, addressing, and recoverability. Later articles assume these bootstrap choices are already disciplined.

In the next article, I will focus on the part that tends to cause the most surprise: networking, firewalls, and the way container networking depends on consistent node communication.