Securing Multi-Cloud Kubernetes: Talos, KubeSpan, and Tailscale

Deploy a production-ready multi-cloud Kubernetes cluster using Talos OS kexec hot-swap, KubeSpan encrypted mesh, and Tailscale-secured management.

Krishna C

September 23, 2025

•

6 min read

TL;DR

I built a multi-cloud Kubernetes cluster using Talos OS (deployed via kexec hot-swap), KubeSpan for encrypted pod networking, and Tailscale for secure management access. No exposed APIs, no VPN complexity—just add any VPS and it joins automatically.

Running Kubernetes across different cloud providers usually means dealing with incompatible networks, manual OS installations, and exposed management APIs. I needed a way to connect nodes securely, encrypt pod traffic across clouds, and lock down administrative access—without drowning in VPN complexity.

The Stack

Component	Purpose
Talos OS	Immutable Kubernetes OS deployed via kexec hot-swap
KubeSpan	WireGuard mesh for encrypted pod-to-pod communication
Flannel	Lightweight CNI that works everywhere
Tailscale	Zero-trust network access for cluster management
Traefik	Gateway API controller for public traffic

Infrastructure managed with OpenTofu and Terragrunt, using Supabase PostgreSQL for remote state.

Network Architecture

1┌─────────────────────────────────────────────────────────────────┐
2│                         Internet                                │
3└────────────┬────────────────────────────────────┬───────────────┘
4             │                                    │
5     ┌───────▼─────────┐                  ┌───────▼────────┐
6     │  VPS Provider A │                  │ VPS Provider B │
7     │  (Public IPs)   │                  │ (Public IPs)   │
8     └───────┬─────────┘                  └───────┬────────┘
9             │                                    │
10     ┌───────▼────────────────────────────────────▼───────┐
11     │           KubeSpan WireGuard Mesh (51820)          │
12     │    Encrypted pod-to-pod across all nodes           │
13     │    10.244.0.0/16 pod network via Flannel           │
14     └───────┬────────────────────────────────────────────┘
15             │
16     ┌───────▼────────────────────────────────────────────┐
17     │   Kubernetes Cluster (Control Plane + Workers)     │
18     │   - DNS Round-Robin to CP nodes (6443)             │
19     │   - Traefik on host network (80/443)               │
20     │   - Tailscale operator for internal routes         │
21     └───────┬────────────────────────────────────────────┘
22             │
23     ┌───────▼────────────────────────────────────────────┐
24     │         Tailscale Management Network               │
25     │   API access (50000, 6443) restricted to:          │
26     │   - Tailscale network (100.64.0.0/10)              │
27     │   - Firewall blocks public API access              │
28     └────────────────────────────────────────────────────┘
29
30Public Traffic:    Internet → Public IP:80/443 → Traefik → Pods
31Management:        Admin → Tailscale VPN → CP Tailscale IP:6443 → API
32Pod-to-Pod:        Pod A → Flannel → KubeSpan → Internet → KubeSpan → Pod B

Talos OS

Talos is an immutable, API-only OS built for Kubernetes. No SSH, no shell, no package manager. What sold me: I can deploy it via kexec without reinstalling the OS.

The VPS boots whatever the provider gives it. I SSH in once, run the deployment script, and minutes later it's running Talos. The OS hot-swaps itself while running.

I've tested this on Hetzner, OVH Cloud, and Contabo. The deployment script auto-detects network configuration and handles the kexec boot.

My infrastructure code configures:

LUKS2 full disk encryption (state and ephemeral partitions)
Tailscale system extension for management access
KubeSpan WireGuard mesh for pod networking
Flannel CNI overlay

How Kexec Works

Kexec loads a new kernel into memory and boots it without going through BIOS/firmware. This lets me replace the OS on a VPS without touching the provider's console.

Here's what my deployment script does:

Step 1: Network Discovery

Before executing kexec, the script detects the existing network configuration to ensure Talos boots with correct network settings:

1# Auto-detect current network configuration
2IP=$(ip -o -4 route get 8.8.8.8 | awk -F"src " '{sub(" .*", "", $2); print $2}')
3GATEWAY=$(ip -o -4 route get 8.8.8.8 | awk -F"via " '{sub(" .*", "", $2); print $2}')
4ETH=$(ip -o -4 route get 8.8.8.8 | awk -F"dev " '{sub(" .*", "", $2); print $2}')
5
6# Get CIDR prefix length
7CIDR=$(ip -o -4 addr show "$ETH" | awk -F"inet $IP/" '{sub(" .*", "", $2); print $2; exit}')
8
9# Get predictable device name using udevadm
10DEV=$(udevadm info -q property "/sys/class/net/$ETH" | \
11  awk -F= '$1~/ID_NET_NAME_ONBOARD/{print $2; exit} \
12           $1~/ID_NET_NAME_PATH/{v=$2} END{if(v) print v}')

This handles the VPS provider quirks I ran into—OVH uses /32 point-to-point networking, Contabo uses standard subnets, and interface names vary wildly.

Step 2: Custom Image Download

The script downloads a custom Talos image with system extensions via the Image Factory:

1SCHEMATIC_ID='e2e3b54334c85fdef4d78e88f880d185e0ce0ba0c9b5861bb5daa1cd6574db9b'
2TALOS_VERSION='v1.11.5'
3
4# Download kernel
5KERNEL_URL="https://factory.talos.dev/image/$SCHEMATIC_ID/$TALOS_VERSION/kernel-amd64"
6wget -q -O /root/talos-kexec/vmlinuz "$KERNEL_URL"
7
8# Download initramfs
9INITRAMFS_URL="https://factory.talos.dev/image/$SCHEMATIC_ID/$TALOS_VERSION/initramfs-amd64.xz"
10wget -q -O /root/talos-kexec/initramfs.xz "$INITRAMFS_URL"

The schematic ID is generated from this configuration:

1customization:
2  systemExtensions:
3    officialExtensions:
4      - siderolabs/iscsi-tools
5      - siderolabs/tailscale

Step 3: Kernel Parameter Construction

Build the kernel command line with proper network configuration:

1# Standard subnet configuration
2  CMDLINE="init_on_alloc=1 slab_nomerge pti=on \
3           console=tty0 console=ttyS0 printk.devkmsg=on \
4           talos.platform=metal \
5           ip=$IP::$GATEWAY:$NETMASK::$DEV:::::"

Security-focused kernel parameters:

init_on_alloc=1 - Zero memory on allocation (prevents info leaks)
slab_nomerge - Prevent slab cache merging (hardens against heap attacks)
pti=on - Page Table Isolation (Meltdown mitigation)
console=tty0 console=ttyS0 - Dual console output (VGA + serial)
talos.platform=metal - Tell Talos it's running on bare metal

Step 4: Execute Kexec

Load the new kernel into memory and immediately boot:

1# Load kernel and initramfs into memory
2kexec -l /root/talos-kexec/vmlinuz \
3      --initrd=/root/talos-kexec/initramfs.xz \
4      --command-line="$CMDLINE"
5
6# Execute kexec (immediate reboot into Talos)
7kexec -e

The SSH connection drops immediately. When it comes back up, it's running Talos.

I automated the whole thing with OpenTofu. Takes about 2-3 minutes per node.

Why bother with all this?

Most VPS providers only offer Ubuntu, Debian, or CentOS. To run Talos, the options are:

Custom ISO upload (most providers don't support this or charge extra)
Netboot/PXE (pain to set up)
Manual installation (can't automate)

Kexec sidesteps all of that. Boot whatever Linux the provider offers, SSH in once, run the script, done.

KubeSpan: Encrypted Pod Networking

KubeSpan is Talos's built-in WireGuard mesh. It connects all cluster nodes and encrypts pod-to-pod traffic across clouds.

The config is dead simple:

1machine:
2  network:
3    kubespan:
4      enabled: true
5cluster:
6  discovery:
7    enabled: true

Nodes find each other and set up WireGuard tunnels automatically. All pod traffic is encrypted, even when crossing the public internet between providers.

Flannel handles the CNI layer with VXLAN on top of KubeSpan. The pod subnet (10.244.0.0/16) just works across all nodes.

Why KubeSpan Over Tailscale for Pod Traffic?

I initially tried running pod networking over Tailscale IPs. It was a mess.

Tailscale uses 1280 MTU. Flannel's VXLAN adds its own overhead. The result: double encapsulation, packet fragmentation, and random connection drops. Pods on different nodes couldn't reliably talk to each other. I spent hours chasing MTU issues before realizing the architecture was just wrong.

KubeSpan runs WireGuard in kernel space with proper MTU handling. No double tunneling, no fragmentation headaches. It just works.

Tailscale is great for management access (talosctl, kubectl). But for pod networking, keep it separate.

Why Flannel Over Cilium?

I spent considerable time trying to make Cilium work as the CNI, L2 LoadBalancer, and Gateway API provider. The promise of an all-in-one solution was attractive.

The reality: debugging network issues across different cloud providers became a time sink. Each provider has different configurations—OVH uses /32 point-to-point, Hetzner uses standard subnets, some have strict MAC filtering.

Cilium's L2 announcements and socket-based load balancing kept breaking in subtle ways. Days troubleshooting why pods couldn't reach services on one provider but worked fine on another.

Flannel keeps it simple. VXLAN overlay, standard configuration, works the same everywhere. Combined with KubeSpan's encrypted mesh, it provides reliable networking across any VPS provider.

Sometimes boring technology wins.

Tailscale: Securing Management Access

Nodes join the Tailscale network as devices, but only for management access.

After Talos deploys, a post-deployment script:

Waits for Tailscale to initialize on all nodes
Applies firewall rules blocking API ports (50000, 6443) from public internet
Allows these ports only from Tailscale network (100.64.0.0/10)
Updates kubeconfig and talosconfig to use Tailscale endpoints

Control plane firewall configuration:

1machine:
2  network:
3    firewall:
4      defaultAction: block
5      rules:
6        - protocol: udp
7          port: 51820
8          ingress: allow
9        - protocol: tcp
10          port: 50000
11          sources: ["100.64.0.0/10"]
12          ingress: allow
13        - protocol: tcp
14          port: 6443
15          sources: ["100.64.0.0/10"]
16          ingress: allow

Now kubectl and talosctl only work through Tailscale:

1kubectl cluster-info     # Uses Tailscale endpoints
2talosctl version         # Uses Tailscale endpoints
3
4curl https://public-ip:6443  # Connection refused

Public Ingress with Traefik Gateway API

Traefik runs on host network mode to accept public traffic on ports 80 and 443. It implements Kubernetes Gateway API instead of traditional Ingress.

Gateway API provides:

Better separation between infrastructure and routing configuration
More expressive routing rules (header matching, path rewrites)
Cleaner multi-tenant support

Host network means direct binding to public IPs—no NodePort, no LoadBalancer services, no extra NAT layer.

The Cost

Here's the part that still surprises me: I'm running a fully HA Kubernetes cluster for about $25-30/month.

OVH Cloud gives me 6 vCPU, 12GB RAM nodes for ~$7.70 each with unlimited bandwidth. Three or four of those and I have a production-ready, encrypted, multi-node cluster. No load balancer fees, no egress charges, no managed Kubernetes markup.

For a personal cluster that's actually secure and HA, that's hard to beat.

---

Need more capacity? Spin up a VPS anywhere, run the deployment, and it joins the cluster in under 10 minutes. No VPN certs to manage, no exposed APIs, no networking headaches.

Thoughts? Hit me up at [email protected]

← Previous

Where Enterprise Software Companies Should Actually Invest in AI

Forget the hype. Here's where AI agents, MCPs, and developer tools actually make sense for enterprise companies looking to capture real value.

Local LLM Hardware Guide 2025: Mac Studio vs. NVIDIA & Ryzen

A deep dive into building a personal AI lab. Comparing Mac Studio Unified Memory against NVIDIA clusters and Ryzen AI for running massive models like Qwen-3 and GLM-4.5 locally.