Radar: A New Kubernetes IDE Worth Knowing About (vs OpenLens, FreeLens)

Radar sweep showing OpenLens, FreeLens and Radar as a Kubernetes IDE comparison

If you’ve been following Kubernetes tooling, you’ve probably already been through the Lens saga: Lens went commercial, OpenLens emerged as the community fork, then FreeLens appeared when OpenLens maintenance slowed. The pattern is familiar — a useful desktop tool, a licensing decision, a fork, another fork.

Radar is not a fork. It’s a different approach to the same problem: giving engineers a useful interface for Kubernetes clusters without the friction of kubectl for every task. Built by Skyhook (YC-backed, Google Cloud Partner), it’s been live since 2025, has 1.7k+ GitHub stars, releases weekly, and the founder reaches out to the community directly. That’s usually a good signal that someone is genuinely building in public.

This article covers what Radar actually does, where it pulls ahead of OpenLens and FreeLens, and when those tools are still the right choice.


The State of Kubernetes Desktop Tooling in 2026

Before getting into Radar specifically, it’s worth naming the landscape clearly:

  • Lens — the original. Electron-based, polished, now commercial (Mirantis). The free Personal tier is non-commercial only. Pro is ~$22-35/user/month.
  • OpenLens — the community fork of Lens before Mirantis closed exec/logs/shell in v6.3 (January 2023). Maintenance has slowed significantly. No active release cadence.
  • FreeLens — a more active community fork, filling the gap left by OpenLens’ decline. Restores the missing features. No commercial backing.
  • k9s — terminal TUI, fast, keyboard-driven, single-cluster. Different audience.
  • Headlamp — CNCF Sandbox project, plugin-extensible, web-based.
  • Radar — Go binary, Apache 2.0, team-oriented, topology and event timeline focused.

The problem with OpenLens and FreeLens is not that they’re bad tools — they’re genuinely useful for the solo developer with one or two clusters. The problem is that they’re single-cluster-at-a-time desktop apps with no concept of team, no persistent state, and no awareness of the modern Kubernetes ecosystem (ArgoCD, Flux, Karpenter, KEDA). As your infrastructure grows, you outgrow them.


What Radar Actually Is

Radar is available in two forms:

  • Radar OSS — a single ~30MB Go binary, Apache 2.0, free forever. Can run locally (desktop app) or deployed in-cluster via Helm. No sidecars, no feature gates.
  • Radar Cloud — same binary, adds a hosted control plane with fleet aggregation, 30-day event retention, SSO/SCIM, scoped RBAC, and shared URLs for team incident response. Priced per cluster ($99/cluster/month for Team), not per user.

The per-cluster pricing is a deliberate design decision — teams don’t pay more as they add engineers, only as they add clusters. For a 20-person platform engineering team managing 5 clusters, Radar Cloud runs $495/month. The equivalent Lens Pro seats would cost $2,200-4,200/month.

For most self-hosted environments, the OSS version is sufficient and costs nothing.


Key Features

Topology View

This is the most visually distinctive feature. Radar renders a live service graph for your cluster: deployments, services, ingresses, cross-namespace dependencies, and east-west traffic flows — all in a single view without running kubectl get all -A and stitching the output together mentally.

OpenLens and FreeLens have resource list views. They show you what exists. Radar shows you how things connect — which is what you actually need when debugging why Service A can’t reach Service B.

Persistent Event Timeline

Kubernetes events are ephemeral by default — they expire after approximately one hour. When something breaks at 2am and you’re looking at it at 9am, the events that explain what happened are gone. Logs may still be there if you’re running a log aggregator, but the Kubernetes-level events (pod restarts, scheduling failures, node pressure events, probe failures) are gone.

Radar retains events. The OSS version extends this beyond the default 1-hour cluster retention. The Cloud version retains 30 days. You can rewind the timeline to any point and reconstruct what the cluster looked like at that moment.

Neither OpenLens nor FreeLens have any event retention beyond what the cluster itself provides.

GitOps Integration (ArgoCD + Flux)

Radar auto-detects ArgoCD and Flux and surfaces sync state, drift, and health directly in the UI. You can see whether a deployment is in sync, when it last synced, and whether it drifted from the desired state in Git.

In OpenLens and FreeLens, ArgoCD resources appear as generic Kubernetes custom resources. You can see the CRDs, but there’s no purpose-built understanding of what they mean — no sync status visualization, no diff view, no rollback trigger.

Helm Management

Radar tracks Helm releases with full revision history and supports one-click rollbacks from the UI. This is similar to what OpenLens/FreeLens offer via the Helm releases view, but Radar adds revision diffing — you can see what changed between release 5 and release 6 before deciding to roll back.

Image Filesystem

You can browse container image filesystems through Radar without needing kubectl exec into a running pod or access to the container registry. Useful for security audits and debugging — you can verify what’s actually in an image at rest.

MCP Server (AI Integration)

Radar ships with an MCP (Model Context Protocol) server, which means you can connect Claude, Cursor, or GitHub Copilot directly to your cluster context and ask questions about it in natural language. The MCP server is token-optimized — it doesn’t dump raw YAML at the model, it structures cluster state into meaningful context.

This is something neither OpenLens nor FreeLens have. It’s also something that’s genuinely useful if you’re already using AI assistants for development work.

Cluster Audit

30 built-in best-practice checks — resource requests/limits, RBAC permissions, image pinning, network policies, security contexts. The checks are labeled by compliance framework. This is not a replacement for dedicated security tooling (Trivy, Falco, Polaris), but it’s a useful first-pass audit without leaving the tool you’re already using.

Multi-Cluster Support (Cloud)

The Cloud tier adds fleet-level visibility: a single view across all clusters, cross-cluster search, and drift detection between environments (e.g., staging vs. production). This is the feature that changes the calculus for platform engineering teams managing 5+ clusters.

OpenLens and FreeLens require you to switch cluster context manually. There is no fleet view.


Architecture: Why a Go Binary Matters

OpenLens and FreeLens are Electron apps — Chromium + Node.js wrapped in a desktop shell. This means:

  • 200-500MB install size
  • 1-2 second startup time on a fast machine, more on slower ones
  • Memory footprint in the hundreds of megabytes
  • Local kubeconfig required on each engineer’s machine

Radar’s in-cluster deployment is a single Go binary (~30MB) that runs as a Pod with a ServiceAccount. It connects to the hosted control plane over outbound WebSocket + TLS. No inbound firewall rules, no kubeconfig distribution, no per-engineer setup.

The local desktop app is also a lightweight Go binary — 65-second startup was demonstrated on a 322-node cluster. That’s not a typo.

For in-cluster deployment, the architecture means security is handled at the ServiceAccount level, not by distributing kubeconfigs to engineer laptops. That matters for teams with security requirements around credential management.


Feature Comparison

FeatureRadar OSSRadar CloudOpenLensFreeLens
LicenseApache 2.0Proprietary (hosted)MIT/GPLMIT
MaintenanceActive (weekly releases)ActiveStalledActive (community)
ArchitectureGo binary / in-clusterIn-cluster + hostedElectronElectron
Multi-clusterBasicFleet view
Event retentionExtended30 daysCluster default (~1h)Cluster default (~1h)
Topology view
GitOps (ArgoCD/Flux)CRDs onlyCRDs only
Helm management
kubectl exec / logs / shell✅ (restored)
MCP / AI integration
Cluster audit
SSO / SCIM
Shared incident URLs
Image filesystem browser
Cost tracking✅ (OpenCost)
PriceFree$99/cluster/monthFreeFree

When Radar Makes Sense

You’re managing multiple clusters. Even with the OSS version, the topology view and event timeline make Radar more useful than OpenLens/FreeLens at 3+ clusters. The Cloud fleet view is the compelling option at 5+.

Your team uses GitOps. If ArgoCD or Flux is part of your workflow, Radar’s native understanding of sync state and drift is meaningfully better than seeing CRDs in a generic list view.

You need post-mortem capability. If your incident review process involves looking at what the cluster was doing when the alert fired, you need event retention. Radar has it; OpenLens and FreeLens don’t.

You’re adopting AI tooling. The MCP server is the most forward-looking feature here. If you use Claude Code, Cursor, or Copilot for your infrastructure work, having cluster context available to those tools without copy-pasting YAML is a genuine productivity improvement.

You have a platform engineering team. Per-cluster pricing, SSO, SCIM, and shared incident URLs are features that only matter if you have more than one person managing infrastructure.

When OpenLens or FreeLens Still Makes Sense

You’re a solo developer with one or two clusters. OpenLens and FreeLens are familiar, local, and have zero setup overhead. If you don’t need team features, event retention, or topology views, they remain perfectly functional tools.

You’re deeply invested in the Lens UX. The resource tree, the terminal integration, the way Lens presents namespace-scoped resources — if your muscle memory is built around that interface, switching has a real cost. Radar is different, not just better.

You need maximum customization. OpenLens and FreeLens support plugins. Radar does not currently have a plugin system.

Your environment is air-gapped or has strict egress restrictions. Radar OSS can run fully in-cluster, but Radar Cloud requires outbound connectivity to the hosted control plane. OpenLens and FreeLens are fully local.


Getting Started

OSS installation takes about two minutes:

# Homebrew (macOS/Linux)
brew install skyhook-io/tap/radar

# Helm (in-cluster)
helm repo add skyhook https://charts.skyhook.io
helm install radar skyhook/radar \
  --namespace radar \
  --create-namespace \
  --set service.type=ClusterIP

Or download the binary directly from radarhq.io.


Verdict

Radar is the most interesting new entrant in the Kubernetes tooling space in a while — not because it replaces everything else, but because it addresses the specific gap that OpenLens and FreeLens never covered: teams, multiple clusters, and persistent state.

For a solo developer, OpenLens or FreeLens are still completely reasonable choices. For a platform engineering team managing more than two clusters with ArgoCD or Flux, Radar’s feature set is materially better and the OSS version costs nothing.

The active release cadence and the YC backing suggest this isn’t a one-person side project — there’s a team actively working on it. Whether the Cloud pricing sticks long-term is a question only usage will answer, but the Apache 2.0 core with an explicit “always open source” commitment is the right foundation.

Worth evaluating if you haven’t already.


Tested with Radar OSS v0.x on Kubernetes 1.29–1.32. Pricing and feature availability as of May 2026.

Kubernetes Cluster Autoscaler vs Karpenter: When to Use Each (2026)

Kubernetes Cluster Autoscaler vs Karpenter: When to Use Each (2026)

Your pods are pending. Your on-call engineer is getting paged. Somewhere in the chain between “I need more compute” and “compute is available,” something is too slow. That something is almost always node provisioning — and the tool you chose to manage it determines whether that delay is 4 minutes or 45 seconds.

Node autoscaling is one of those infrastructure decisions that looks simple until you’re running it in production. Two schedulable pods sitting in Pending state doesn’t just mean a delayed deployment — it means latency spikes, dropped traffic, breached SLOs, and engineers debugging things that should have been invisible. At scale, it also means either burning money on over-provisioned nodes or gambling on under-provisioning at the worst possible moment.

Cluster Autoscaler (CA) has been the default answer for years. Karpenter emerged from AWS in 2021, graduated to stable in 2023, and by 2025 had become the default recommendation for most AWS-native clusters. In 2026, both tools are mature, widely deployed, and genuinely good — but they solve the problem differently, and picking the wrong one for your environment has real consequences.

This article is a deep technical comparison. It assumes you already know what Kubernetes is and have opinions about infrastructure. The goal is to give you a clear picture of how each tool works, where each one wins, and a decision framework you can actually use.


Why Node Autoscaling Is Hard

The fundamental tension in autoscaling is this: you want compute available before you need it, but you don’t want to pay for compute you’re not using. These goals are in direct conflict, and every autoscaling system is an attempt to find the least-bad trade-off.

Without autoscaling, you’re doing one of two things:

  1. Over-provisioning — you run enough nodes to handle peak load at all times. Your average utilization sits at 20–30%, and you’re paying for the other 70–80% to sit idle.
  2. Under-provisioning — you run lean, and when traffic spikes, pods go Pending. Your SLOs breach. You get paged at 3am to manually scale.

A common failure mode with poorly tuned autoscaling is the “thundering herd at scale-up” pattern: HPA creates new pods faster than node autoscaling can provision capacity. The provisioning window matters. With CA and typical ASG-backed node groups on AWS, you’re looking at 4–8 minutes. With Karpenter, 60–90 seconds. At 100 RPS and a 3-minute window, that’s 18,000 requests under degraded conditions.


Cluster Autoscaler: How It Actually Works

Cluster Autoscaler is a Kubernetes-native project under the kubernetes/autoscaler repository, in production since 2016, supporting AWS, GCP, Azure, Alibaba, DigitalOcean, and more.

The Node Group Model

CA operates on node groups — ASGs on AWS, MIGs on GCP, VMSSs on Azure. CA’s job is to decide when to increase or decrease the desired capacity of these groups. CA does not provision individual nodes. It scales node groups, and the node group provisions nodes. This indirection adds latency and reduces flexibility.

Scale-Up: Detecting Unschedulable Pods

CA runs a control loop (default scan interval: 10 seconds). For each Pending pod with PodScheduled=False, CA simulates adding a node of each known node group type and checks if the pod would become schedulable. When a node group is selected, CA applies an expander to choose which group to scale:

  • least-waste — minimizes CPU/memory waste after scheduling (best default for cost)
  • most-pods — maximizes pods scheduled per scale-up operation
  • priority — lets you define ordering via ConfigMap
  • grpc — delegates to an external gRPC service
# Cluster Autoscaler deployment — AWS, production-tuned
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cluster-autoscaler
  template:
    metadata:
      labels:
        app: cluster-autoscaler
      annotations:
        cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
    spec:
      priorityClassName: system-cluster-critical
      serviceAccountName: cluster-autoscaler
      containers:
        - image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.30.3
          name: cluster-autoscaler
          resources:
            requests:
              cpu: 100m
              memory: 600Mi
            limits:
              cpu: 200m
              memory: 1Gi
          command:
            - ./cluster-autoscaler
            - --cloud-provider=aws
            - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster
            - --expander=least-waste
            - --balance-similar-node-groups=true
            - --scale-down-delay-after-add=10m
            - --scale-down-unneeded-time=10m
            - --scale-down-utilization-threshold=0.5
            - --max-graceful-termination-sec=600
            - --scan-interval=10s

Scale-Down: The Conservative Approach

A node is a scale-down candidate only if:
– CPU and memory utilization (by requests) is below threshold (default: 50%)
– All pods could be rescheduled elsewhere
– No pod has cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
– The node has been underutilized for at least --scale-down-unneeded-time (default: 10m)

This conservatism prevents churn — a feature, not a bug.


Karpenter: How It Actually Works

Karpenter is a CNCF incubating project originally built by AWS, donated to CNCF in 2023, GA (v1.0) in mid-2024. Providers exist for AWS (stable), Azure (stable), and GCP (beta).

The Core Insight: Bypass the Node Group

Karpenter calls the EC2 RunInstances API directly — no ASG involvement. This means:
– Any instance type in a single request, without pre-configuring a node group
– No intermediary: Karpenter → EC2 API → node joins cluster
– Right-size nodes to exactly what workloads need, across the full instance catalog
– Karpenter handles full node lifecycle, including termination

NodePool and EC2NodeClass

# EC2NodeClass — cloud-specific parameters
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiSelectorTerms:
    - alias: al2023@latest
  role: "KarpenterNodeRole-my-cluster"
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: "my-cluster"
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: "my-cluster"
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 50Gi
        volumeType: gp3
        encrypted: true
  metadataOptions:
    httpTokens: required
    httpPutResponseHopLimit: 1
---
# NodePool — intent and constraints
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: general-purpose
spec:
  template:
    spec:
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand", "spot"]
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64", "arm64"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["c", "m", "r"]
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["5"]
        - key: karpenter.k8s.aws/instance-size
          operator: NotIn
          values: ["nano", "micro", "small", "medium", "large"]
      expireAfter: 720h
  limits:
    cpu: "1000"
    memory: 1000Gi
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 5m
    budgets:
      - nodes: "5%"
        schedule: "0 8 * * mon-fri"
        duration: 10h
      - nodes: "25%"

Just-in-Time Provisioning and Bin Packing

When pods go Pending, Karpenter watches the event (not polls) and immediately:
1. Collects all Pending pods
2. Simulates bin packing — fewest possible nodes across the full instance catalog
3. Selects instances that satisfy all pod requirements
4. Calls EC2 API to provision the optimal instance(s)

Disruption and Consolidation

Karpenter’s differentiated value: active cluster consolidation. It evaluates whether nodes can be removed by redistributing pods onto others, or replaced with a smaller instance type. A c5.4xlarge running 4 vCPU worth of pods gets replaced with a c5.xlarge. Teams commonly report 30–50% compute cost reduction.

consolidationPolicy options:
WhenEmpty — only remove nodes with no workload pods (safest)
WhenEmptyOrUnderutilized — also replace underutilized nodes with smaller ones


Architecture Comparison

DimensionCluster AutoscalerKarpenter
Node provisioning modelScales node groups (ASG/MIG/VMSS)Direct cloud API, no node groups
Instance flexibilityPre-defined node group typesFull instance catalog at runtime
Scale-up triggerPolling (10s scan interval)Watch-based event (near-instant)
Scale-downRemoves underutilized nodesRemoves + consolidates + right-sizes
Spot handlingVia ASG + AWS Node Termination HandlerNative, first-class, no NTH needed
Configuration modelDeployment flagsDeclarative CRDs
Cloud supportAll major + on-premAWS (GA), Azure (stable), GCP (beta)
ConsolidationNoYes
Community maturityVery mature (since 2016)Mature (GA 2024)

Scaling Speed: The Numbers

Cluster Autoscaler on AWS (typical):
1. Pod Pending → CA scan detects (0–10s)
2. ASG UpdateAutoScalingGroup API call (~15–30s)
3. EC2 instance starts (1–2 min)
4. Node bootstrap + kubelet registration (30–60s)
5. Pod scheduled (5–10s)

Total: 3–6 minutes (up to 12 min during high-demand periods)

Karpenter on AWS (typical):
1. Pod Pending → watch event fires (~1s)
2. EC2 RunInstances API call (~2–3s)
3. EC2 instance starts (same hardware — 1–2 min)
4. Node bootstraps + Ready (30–60s)
5. Pod scheduled (~5s)

Total: 90 seconds to 3 minutes


Cost Optimization: Where Karpenter Pulls Ahead

Right-sizing: CA requires pre-defined node groups. Karpenter selects the minimum viable instance for the pending workload from the full catalog.

Consolidation vs scale-down: CA removes underutilized nodes. Karpenter replaces a large underutilized node with a smaller one that still fits all pods. This produces compounding savings over time.

Spot handling: Karpenter receives EC2 interruption notices, pre-provisions a replacement, and drains the node — all within the 2-minute window. No AWS Node Termination Handler required. It also diversifies spot requests across instance types automatically to reduce simultaneous interruption risk.


Multi-Cloud Support in 2026

CloudCluster AutoscalerKarpenter
AWS✅ Production-stable✅ Production-stable (reference impl.)
GCP✅ Production-stable⚠️ Beta (karpenter-provider-gcp)
Azure✅ Production-stable✅ Stable (karpenter-provider-azure)
Alibaba✅ Supported❌ No provider
DigitalOcean✅ Supported❌ No provider
On-premises / Cluster API✅ Supported❌ Not supported

When Cluster Autoscaler Is Still the Right Choice

  1. Non-AWS environments — GCP, Alibaba, DigitalOcean, on-prem with Cluster API
  2. Existing node group architecture — significant investment in ASG design, compliance tooling
  3. Regulatory constraints — some frameworks require ASG-backed provisioning audit trails
  4. Cluster API / bare metal — CA is the only mature option
  5. Team familiarity and working-well CA deployment — migration cost may not justify benefit

When Karpenter Is the Right Choice

  1. AWS-native, cost optimization priority — right-sizing + consolidation = meaningful cost reduction
  2. Diverse and variable workloads — batch, spot, GPU, stateless APIs — Karpenter handles all with a few NodePools
  3. Spot-heavy clusters — native interruption handling, diversification, no NTH
  4. Declarative infrastructure-as-code culture — NodePools version cleanly in Git
  5. Low-latency scaling requirements — event-driven workloads, KEDA-triggered jobs, sharp traffic spikes

Running Both: Migration Path and Gotchas

Separating Responsibility

Use labels and taints to prevent CA and Karpenter from managing the same nodes:

# NodePool with taint — CA-managed pods won't tolerate this
spec:
  template:
    metadata:
      labels:
        provisioner: karpenter
    spec:
      taints:
        - key: karpenter.sh/provisioned
          value: "true"
          effect: NoSchedule

Gradual Migration

  1. Phase 1 — Karpenter manages spot/batch workloads. CA manages on-demand production nodes.
  2. Phase 2 — Migrate spot workloads fully. Remove AWS NTH.
  3. Phase 3 — Migrate on-demand. Reduce CA node group capacity gradually.
  4. Phase 4 — Decommission CA once all groups are empty.

Key Gotchas

  • Karpenter consolidation + permissive PDBsmaxUnavailable: 100% will cause disruptive consolidation. Audit PDBs before enabling WhenEmptyOrUnderutilized.
  • NodePool limits are hard stops — pods go Pending indefinitely at limit. Monitor utilization.
  • AMI drift@latest alias picks up new AMIs on new nodes. Consider pinning for strict change control.
  • Simultaneous scale-down conflicts — use strict label/taint segregation during migration.

Decision Framework

FactorCluster AutoscalerKarpenter
Cloud supportAll clouds + on-premAWS (GA), Azure (stable), GCP (beta)
Provisioning speed4–8 minutes60–120 seconds
Instance flexibilityNode group pre-config requiredFull catalog, runtime selection
Cost optimizationScale-down onlyScale-down + consolidation + right-sizing
Spot integrationVia ASG + NTHNative, first-class
Operational complexityLowerModerate
Cluster API / bare metalYesNo
ConsolidationNoYes
Running on AWS?
├── No → Azure? → Karpenter (stable) or CA
│        GCP?   → CA or GKE NAP (preferred)
│        Other  → Cluster Autoscaler
│
└── Yes → Hard regulatory constraints on non-ASG provisioning?
          ├── Yes → Cluster Autoscaler
          └── No → Cost optimization priority or diverse workloads?
                   ├── Yes → Karpenter
                   └── No → Either (flip for team preference)

FAQ

Is Karpenter a drop-in replacement for Cluster Autoscaler?

No. Different configuration model, different concepts. Migration requires re-expressing node group config as NodePools/NodeClasses, auditing PDBs, and running both in parallel. Budget at least a sprint for a medium-sized cluster.

Can I run Karpenter on self-managed Kubernetes (not EKS)?

Yes, but non-trivial. Karpenter requires IAM credentials (IRSA or equivalent) to call EC2 APIs. On self-managed clusters, this requires more setup than on EKS where IRSA is built-in.

How does Karpenter interact with HPA and VPA?

No conflict. HPA creates pods → pods go Pending if insufficient nodes → Karpenter provisions nodes → pods scheduled. VPA adjusts pod resource requests, which Karpenter uses as inputs for bin packing.

What happens when Karpenter itself goes down?

Existing nodes and pods continue normally. New pods requiring provisioning go Pending until Karpenter recovers. Scale-down and consolidation pause. Deploy multiple replicas with leader election for production.

Does Karpenter support GPU nodes?

Yes. GPU instance types (p3, p4, g4, g5) can be included in NodePool requirements. Create dedicated NodePools with appropriate taints for GPU-requesting pods.

How does Karpenter handle AMI updates?

The expireAfter field forces node rotation. When a node expires, Karpenter pre-provisions a replacement with the latest AMI per EC2NodeClass, then drains and terminates the old node — a rolling AMI update mechanism without additional tooling.

Is Cluster Autoscaler still actively maintained?

Yes. CA remains under active development in kubernetes/autoscaler, with releases tracking Kubernetes minor versions. It is not being deprecated. For non-AWS environments and working CA deployments, it remains a fully supported and rational choice.


Tested against Kubernetes 1.28–1.32. Karpenter v1.x API (GA). CA v1.30.x. AWS provider examples; Azure and GCP provider details may differ.

Kubernetes Resource Requests and Limits: The Complete Production Guide

Kubernetes Resource Requests and Limits: The Complete Production Guide

Your pods are being OOMKilled at 3 AM. Your latency p99 spikes every few minutes with no obvious cause. Your cluster scheduler is placing workloads on nodes that can’t sustain them. In most production Kubernetes incidents, misconfigured resource requests and limits are either the direct cause or an accelerating factor.

This is not a “what are requests and limits” tutorial. It is a deep technical guide for engineers who run Kubernetes in production and need to understand what actually happens inside the kernel when these values are set — and what the consequences are when they are wrong.


What Requests and Limits Actually Are

The Kubernetes documentation explains requests and limits at the API level. What it underexplains is the enforcement mechanism: cgroups.

When the kubelet admits a pod onto a node, it creates a cgroup hierarchy for that pod under /sys/fs/cgroup/. Each container in the pod gets its own cgroup. The values you set in your pod spec translate directly into cgroup parameters:

CPU requestcpu.shares (cgroups v1) or cpu.weight (cgroups v2)
CPU limitcpu.cfs_quota_us and cpu.cfs_period_us
Memory requestmemory.soft_limit_in_bytes (advisory, used for eviction scoring)
Memory limitmemory.limit_in_bytes (hard enforcement, triggers OOMKill)

The scheduler uses requests to make placement decisions. It does not know about actual utilization — it knows about committed capacity. A node with 4 cores where running pods have a total CPU request of 3.5 cores has 0.5 cores of schedulable capacity remaining, even if actual CPU utilization is 15%.

This is why you can have a fully “utilized” cluster (by requests) where nodes are idle, and why you can have nodes at 95% CPU utilization that still accept new pods because their requests are low.

The kubelet uses limits to enforce runtime constraints via those cgroup parameters. The scheduler never sees limits.


CPU vs Memory: Why They Behave Fundamentally Differently

This is the most consequential thing to understand about Kubernetes resource management, and it is routinely misunderstood even by experienced engineers.

CPU Is Compressible

CPU is a time-shared resource. If your container tries to use more CPU than its limit allows, the Linux CFS scheduler simply throttles it — it stops getting CPU time until the next scheduling period. The process continues. It just waits.

From the application’s perspective: things slow down. Latency increases. Throughput drops. But the process does not die.

Memory Is Not Compressible

Memory is not time-shared. If your container tries to allocate memory beyond its limit, there is no “slow down” path. The Linux OOM killer selects a process in the cgroup and kills it. The container dies.

From the application’s perspective: the process is terminated. Kubernetes restarts the container. You see OOMKilled in kubectl describe pod.

PropertyCPUMemory
EnforcementCFS throttlingOOM Kill
Process survives?Yes (degraded performance)No (killed and restarted)
Compressible?YesNo
Scheduler visibilityRequests onlyRequests only
Over-limit consequenceLatency spikesContainer restart
Setting limits: recommended?Situational (see below)Always

This asymmetry drives every recommendation in the rest of this guide.


QoS Classes: Eviction Priority Under Pressure

Kubernetes assigns each pod a Quality of Service (QoS) class based on the requests and limits set across all its containers. This class determines eviction priority when a node is under memory pressure.

Guaranteed

Condition: Every container has CPU and memory requests and limits set, and requests equal limits for both CPU and memory.

resources:
  requests:
    cpu: "500m"
    memory: "512Mi"
  limits:
    cpu: "500m"
    memory: "512Mi"

Guaranteed pods are the last to be evicted. The kubelet will exhaust BestEffort and Burstable pods before touching these. They get the most predictable resource allocation on the node.

Warning: Guaranteed does not mean “always available.” It means “last to be killed.” On a heavily overloaded node, even Guaranteed pods can be evicted.

Burstable

Condition: At least one container has a CPU or memory request or limit set, but the pod does not meet Guaranteed criteria.

resources:
  requests:
    cpu: "250m"
    memory: "256Mi"
  limits:
    cpu: "1000m"
    memory: "1Gi"

Burstable pods are evicted after BestEffort but before Guaranteed. They can burst above their request when capacity is available, but they are not protected when the node is under pressure.

BestEffort

Condition: No container in the pod has any CPU or memory requests or limits set.

# No resources block at all

BestEffort pods are evicted first, always. They get whatever capacity is left over after scheduled workloads consume their requested share. On a loaded node, they may be starved entirely.

In production: never run stateful workloads or business-critical services as BestEffort. The Kubernetes scheduler will place them anywhere, and the kubelet will kill them first.


Common Misconfiguration Patterns and Their Consequences

Pattern 1: No Requests or Limits Set

Effect: BestEffort QoS. First to be evicted under memory pressure. Scheduler places pods arbitrarily — it has no data for placement decisions, so it defaults to LeastRequestedPriority, which effectively means these pods may land on the same nodes as heavily-loaded workloads.

Real consequence: Your “lightweight” background jobs kill your API servers at 3 AM when a memory spike triggers eviction and BestEffort pods happen to be sitting next to them on the same node.

Pattern 2: Requests Equal Limits (Guaranteed QoS)

This is the common “safe” pattern recommended in older Kubernetes documentation. It is not wrong, but it has a trap:

CPU limits = CPU requests means CPU throttling is guaranteed to trigger. Your pod will be throttled the moment it tries to burst above the request — during startup, during GC, during a traffic spike — even if the node has abundant free CPU.

For latency-sensitive applications, this means predictable throttling spikes at exactly the moments you need the most CPU.

Memory: Setting memory request = memory limit is appropriate and recommended. The behavior is correct: the pod runs in a controlled memory budget.

Pattern 3: Limits Much Higher Than Requests (Burstable with High Ratio)

resources:
  requests:
    cpu: "100m"
    memory: "128Mi"
  limits:
    cpu: "4000m"
    memory: "4Gi"

This is the opposite extreme. The scheduler thinks this pod needs 100m CPU and 128Mi memory. Dozens of these can be scheduled onto a single node. When they all burst simultaneously — which they will, during a deployment, a traffic event, or a GC cycle — the node is overloaded, memory pressure triggers OOMKill cascades, and the scheduler has no idea anything is wrong because the committed capacity (by requests) looks fine.

The limit:request ratio matters. A 10x or 20x memory limit:request ratio on many pods is a recipe for node instability. A reasonable starting point is 2x–4x for memory, less for CPU.

Pattern 4: CPU Limits Set to “Be Safe”

This is the subtlest misconfiguration and the one with the most hidden latency impact. We cover it in depth in the next section.


The CPU Throttling Problem: CFS Bandwidth and Hidden Latency

This is where many production Kubernetes deployments have a silent performance problem they cannot easily diagnose.

How CFS Bandwidth Throttling Works

The Linux Completely Fair Scheduler (CFS) enforces CPU limits using bandwidth control. The relevant parameters are:

  • cpu.cfs_period_us: the accounting period, default 100ms
  • cpu.cfs_quota_us: how many microseconds of CPU time the cgroup can use per period

If you set cpu: "500m" as a limit, Kubernetes sets cpu.cfs_quota_us = 50000 (50ms per 100ms period). This means the container can use at most 50% of one CPU core per 100ms window.

The problem: quota is enforced per period, not as a moving average. If your container uses its full 50ms allocation in the first 60ms of a period, it is throttled for the remaining 40ms — even if the node has 7 idle CPUs. The CPU sits idle. Your container waits.

Why This Causes Latency Spikes Even at Low Utilization

This is counterintuitive and the source of many production mysteries. You can have a container running at 10% average CPU utilization that is regularly throttled, because its instantaneous CPU usage within a single 100ms window exceeds its quota.

Java applications with JVM garbage collection are particularly vulnerable. GC causes a CPU burst of short duration. If that burst exceeds the per-period quota, the GC pause is extended artificially by throttling — even though the GC event itself would have been short.

The same applies to Node.js event loop processing, Python import at startup, and any application that has bursty CPU behavior (which is most of them).

The Cloudflare and Netflix Evidence

Cloudflare published findings showing that CPU throttling was responsible for significant tail latency increases in their containerized workloads, and that removing CPU limits reduced p99 latency substantially for services that appeared to have headroom. Netflix has documented similar patterns in their capacity planning work, noting that per-period quota enforcement does not model real application CPU behavior accurately.

The kernel community has been aware of this for years. The fix — moving to cgroups v2 with better scheduler integration — helps but does not eliminate the problem. Kubernetes 1.25+ with cgroups v2 nodes experience less throttling under the same limits, but the fundamental issue remains: CPU limits throttle bursty applications unpredictably.

The Recommendation: Consider Not Setting CPU Limits

This is controversial but grounded in the evidence:

For latency-sensitive services: do not set CPU limits. Set CPU requests accurately and rely on the scheduler for placement.

The argument:
– CPU throttling is a soft failure mode that is hard to observe and diagnose
– OOMKill is a hard failure mode that is visible and recoverable
– CPU requests give the scheduler accurate placement data without creating throttling
– Nodes handle CPU oversubscription gracefully through time-sharing; they do not handle memory oversubscription gracefully

When to still set CPU limits:
– Multi-tenant clusters where noisy neighbor isolation is critical
– Batch workloads where predictable CPU allocation matters more than latency
– When your monitoring and alerting can catch CPU starvation at the node level

When you do not set CPU limits, you must set CPU requests accurately. A request of 100m for a service that normally uses 800m means the scheduler places it on a node that cannot actually sustain it. The result is real CPU starvation, not artificial throttling — but it is CPU starvation nonetheless.


Memory: Always Set Limits

The contrast with CPU is direct. Memory is non-compressible. A container that leaks memory or has a runaway allocation will consume all available node memory if unconstrained. This does not degrade gracefully — it triggers the OOM killer, which may kill unrelated processes on the node.

Always set memory limits. Always.

The consequence — OOMKill — is visible, logged, and Kubernetes handles it by restarting the container. An OOMKilled exit code is actionable: you either have a memory leak, your limit is too low, or your sizing methodology is wrong. All three are diagnosable.

The alternative — no memory limit — means a single leaking pod can destabilize an entire node and trigger eviction cascades affecting unrelated workloads.

Set memory requests equal to the p95 steady-state usage of your application. Set memory limits at 1.5x–2x the request to absorb traffic spikes and GC pressure. Profile your application under load to establish these baselines.


Vertical Pod Autoscaler (VPA)

VPA is the Kubernetes component designed to solve the sizing problem automatically. It observes actual resource utilization and recommends (or applies) adjusted requests.

How VPA Works

VPA has three components:

  • Recommender: Watches historical metrics and computes recommended requests based on observed utilization. Does not modify pods.
  • Updater: Evicts pods whose current requests differ significantly from recommendations (when VPA mode is Auto or Recreate).
  • Admission Controller: Mutates pod specs at admission time to apply recommendations from the Recommender.
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-server-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  updatePolicy:
    updateMode: "Off"   # Recommend only — do not evict pods
  resourcePolicy:
    containerPolicies:
    - containerName: api-server
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 4000m
        memory: 4Gi
      controlledResources: ["cpu", "memory"]
      controlledValues: RequestsAndLimits

VPA Modes

ModeBehavior
OffCompute recommendations only. No pod mutations.
InitialApply recommendations to new pods only. Do not evict running pods.
RecreateEvict pods when recommendations change significantly.
AutoCurrently equivalent to Recreate. May change in future versions.

When to Use VPA

Right-sizing during initial rollout: Run VPA in Off mode for 1–2 weeks on a new service. Review recommendations before applying. This is the most valuable use case.

Services with unpredictable or seasonal load patterns: VPA adapts requests based on observed behavior. Combined with HPA for horizontal scaling, this gives you right-sized replicas that scale out horizontally.

VPA and HPA cannot both manage the same metric. If HPA is scaling on CPU utilization, do not use VPA with controlledValues: RequestsAndLimits for CPU — they will fight each other. Use controlledValues: RequestsOnly and let HPA manage scale.

VPA limitations:
– Requires pod restarts to apply recommendations (Updater evicts pods)
– Does not work well with stateful workloads in strict availability windows
– Recommender needs sufficient history (at least a few days) to produce reliable recommendations
– Does not account for traffic spikes that haven’t been observed yet


LimitRange and ResourceQuota: Namespace-Level Guardrails

Requests and limits on individual pods solve the per-workload problem. LimitRange and ResourceQuota solve the namespace and cluster-level governance problem.

LimitRange

LimitRange sets default requests and limits for containers in a namespace, and enforces minimum/maximum boundaries. Any pod admitted to the namespace that does not have explicit requests/limits set will receive the defaults.

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: production
spec:
  limits:
  - type: Container
    default:
      cpu: "500m"
      memory: "512Mi"
    defaultRequest:
      cpu: "100m"
      memory: "128Mi"
    max:
      cpu: "4000m"
      memory: "8Gi"
    min:
      cpu: "50m"
      memory: "64Mi"
  - type: Pod
    max:
      cpu: "8000m"
      memory: "16Gi"
  - type: PersistentVolumeClaim
    max:
      storage: "50Gi"
    min:
      storage: "1Gi"

Key behaviors:
default applies as the limit for containers that set a request but no limit
defaultRequest applies as the request for containers that set no request
max and min cause admission to fail if violated
– LimitRange applies at admission time — changing it does not affect running pods

Use LimitRange to:
– Prevent BestEffort pods from being admitted (by setting defaultRequest values)
– Enforce organizational standards for minimum resource specifications
– Protect the cluster from pods requesting unbounded resources

ResourceQuota

ResourceQuota limits the total amount of resources that can be consumed by all pods in a namespace. This is the multi-tenant governance tool.

apiVersion: v1
kind: ResourceQuota
metadata:
  name: production-quota
  namespace: production
spec:
  hard:
    requests.cpu: "20"
    requests.memory: "40Gi"
    limits.cpu: "40"
    limits.memory: "80Gi"
    pods: "100"
    persistentvolumeclaims: "20"
    requests.storage: "500Gi"
    count/deployments.apps: "50"
    count/services: "50"
    count/secrets: "100"
    count/configmaps: "100"

Critical interaction with LimitRange: When ResourceQuota is active in a namespace, every pod must have requests and limits set or it will be rejected. This is why LimitRange defaults are important — they ensure pods without explicit resources are not rejected by the quota system.

Use ResourceQuota to:
– Enforce team/application resource budgets in shared clusters
– Prevent runaway deployments from consuming all cluster capacity
– Implement chargeback policies (track resource consumption per namespace)


Practical Sizing Methodology

Step 1: Instrument Before You Set Values

Deploy initially with only requests set (no CPU limits, memory limits set conservatively high) and monitor for 2–4 weeks under realistic load.

Useful PromQL queries for sizing:

# p95 CPU usage over the last 7 days
histogram_quantile(0.95,
  rate(container_cpu_usage_seconds_total{
    container="api-server",
    namespace="production"
  }[5m])
)

# p99 memory working set over the last 7 days
quantile_over_time(0.99,
  container_memory_working_set_bytes{
    container="api-server",
    namespace="production"
  }[7d]
)

# CPU throttling ratio (alert if >5%)
rate(container_cpu_cfs_throttled_seconds_total{container="api-server"}[5m])
/
rate(container_cpu_cfs_periods_total{container="api-server"}[5m])

Step 2: Set CPU Requests from p95 Observations

Set CPU request = p95 CPU usage under realistic production load. For latency-sensitive services: do not set CPU limits. For batch or background jobs: set CPU limits at 2x–4x the request.

Step 3: Set Memory Requests and Limits

Set memory request = p95 memory working set over at least 7 days. Set memory limit = max(observed peak, 1.5 × request). For Java/Python with large processing, use 2x.

# Production example: Java microservice
resources:
  requests:
    cpu: "500m"       # p95 observed: ~420m
    memory: "768Mi"   # p95 observed: ~680Mi
  limits:
    # No CPU limit — latency-sensitive service
    memory: "1.5Gi"   # 2x request, covers GC pressure

Step 4: Use VPA Recommendations to Validate

Run VPA in Off mode alongside your manually-set values. After 1–2 weeks, compare VPA recommendations to your current settings.

Step 5: Adjust for Workload Lifecycle Events

Account for: JVM warmup at startup (CPU spike 3–10x steady-state), rolling deployment overlap (namespace quota headroom), and known traffic peaks (size to peak, not average).


Decision Framework: What to Set Based on Workload Type

Workload TypeCPU RequestCPU LimitMemory RequestMemory LimitQoS Target
Latency-sensitive API (Go, Java, Node)p95 observedDo not setp95 observed1.5–2x requestBurstable
Batch / background jobsp50 observed2–4x requestp95 observed1.5x requestBurstable
System-critical (coredns, metrics-server)ConservativeEqual to requestConservativeEqual to requestGuaranteed
Stateful / databases (in-cluster)p95 observedDo not setp99 observed1.25x requestBurstable
Dev/test workloadsLow (100m)2x requestLow (128Mi)2x requestBurstable
Sidecar containers (envoy, otel-collector)Profile individuallyContextualProfile individually1.5x requestMatches primary

Monitoring and Alerting

# OOMKill rate
- alert: ContainerOOMKilled
  expr: increase(kube_pod_container_status_last_terminated_reason{reason="OOMKilled"}[5m]) > 0
  for: 0m
  labels:
    severity: warning

# CPU throttling >10%
- alert: CPUThrottlingHigh
  expr: |
    rate(container_cpu_cfs_throttled_seconds_total[5m])
    /
    rate(container_cpu_cfs_periods_total[5m])
    > 0.10
  for: 5m
  labels:
    severity: warning

# Memory near limit >85%
- alert: MemoryNearLimit
  expr: |
    container_memory_working_set_bytes
    /
    (container_spec_memory_limit_bytes > 0)
    > 0.85
  for: 5m
  labels:
    severity: warning

FAQ

Q: My Java application keeps getting OOMKilled but I’ve set limits at 2x average usage. What am I missing?

The JVM heap (-Xmx) is not the only memory consumer. Off-heap buffers, Metaspace, thread stacks, and JVM overhead add 25–40% on top. Set -Xmx at ~75% of your container memory limit. For a 1Gi limit: -Xmx768m is a safe starting point.

Q: Should I set the same resources in all environments?

No. Dev/test can use lower values. But the ratio between request and limit should be similar, and the resource profile should be close enough to catch misconfigurations before production.

Q: Can I use HPA and VPA together?

Yes, carefully. Use HPA for replica scaling (CPU or custom metrics) and VPA in Off mode or controlledValues: RequestsOnly for right-sizing guidance. Never have both managing the same metric simultaneously.

Q: My cluster uses cgroups v2. Does CPU throttling still apply?

Improved but not eliminated. cgroups v2 uses a weight-based scheduler that reduces throttling artifacts. However, cpu.cfs_quota_us enforcement still exists when CPU limits are set. For latency-sensitive workloads, the case for not setting CPU limits remains valid on cgroups v2.

Q: What is a realistic cluster overcommit ratio?

CPU: 5–10x overcommit (total requests vs physical cores) is common for mixed workloads with accurate requests. Memory: 1.5–2x cluster-level overcommit is manageable at 1.5x request:limit ratios. Beyond 2x, node memory pressure events become frequent.

Q: LimitRange is set but pods are still admitted without resources. Why?

LimitRange defaults only apply to containers with no resource specification at all. If a container specifies requests.cpu but not limits.cpu, the LimitRange default for CPU does not fill in the missing limit. Also verify the LimitRange is in the correct namespace: kubectl get limitrange -n <namespace>.

Q: What does a pod with no memory limit do to a node?

It can consume all available node memory unconstrained. This triggers the Linux OOM killer at the node level, which may kill processes outside the container — including the kubelet itself in extreme cases. Memory limits are non-negotiable in production.


Tested against Kubernetes 1.28–1.32. cgroups v2 behavior noted where it differs from v1. VPA examples use autoscaling.k8s.io/v1 API (VPA 0.14+).

Kubernetes Security Best Practices: A Production Hardening Guide

Kubernetes Security Best Practices: A Production Hardening Guide

Kubernetes security is not a single feature you enable — it is a layered discipline that spans the control plane, workloads, networking, supply chain, and runtime. This guide covers the security controls that matter most in production, why each one exists, and how to implement them without breaking your cluster.

The Kubernetes Attack Surface

Before hardening anything, understand what you are protecting. A Kubernetes cluster has several distinct attack surfaces:

  • API server — The central control plane. Any entity that can reach it with valid credentials can read cluster state, modify workloads, or escalate privileges.
  • etcd — Stores all cluster state in plain text, including Secrets. Direct etcd access is equivalent to root on every node.
  • Nodes — A compromised node can access all Secrets mounted on pods running on it, access the kubelet API, and potentially escape to the hypervisor.
  • Pods — Privileged pods, host-network pods, and pods with excessive capabilities can break container isolation.
  • Supply chain — Malicious images, compromised registries, and unsigned artifacts can introduce attacker-controlled code into your cluster.
  • RBAC — Overly permissive roles allow lateral movement and privilege escalation once an attacker gains any foothold.

The controls below address each of these surfaces. Prioritize based on your threat model — a public-facing multi-tenant cluster needs all of them; an internal development cluster can relax some.

1. RBAC: Least Privilege from Day One

Role-Based Access Control is Kubernetes’ primary authorization mechanism. Most clusters fail at RBAC not because it is misconfigured, but because it is over-permissive by default and nobody reviews it systematically.

Common RBAC Mistakes

  • Binding to cluster-admin for convenience. Almost no workload needs cluster-admin. Use namespaced roles wherever possible.
  • Using * verbs or resources in roles. Wildcard permissions are almost always broader than intended.
  • Not auditing ServiceAccount token usage. Every pod gets a ServiceAccount. The default ServiceAccount in most namespaces has no permissions, but custom workloads often get over-permissive SAs.
  • Forgetting automountServiceAccountToken: false. If a workload does not need to talk to the Kubernetes API, disable token mounting entirely.

Practical RBAC Patterns

For a workload that only needs to read ConfigMaps in its own namespace:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: configmap-reader
  namespace: my-app
rules:
- apiGroups: [""]
  resources: ["configmaps"]
  verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: my-app-configmap-reader
  namespace: my-app
subjects:
- kind: ServiceAccount
  name: my-app
  namespace: my-app
roleRef:
  kind: Role
  name: configmap-reader
  apiGroup: rbac.authorization.k8s.io

Audit existing RBAC with kubectl-who-can or rbac-tool to find overly permissive bindings before attackers do.

2. Pod Security Standards

PodSecurityPolicy was deprecated in Kubernetes 1.21 and removed in 1.25. Its replacement is Pod Security Admission (PSA), a built-in admission controller that enforces one of three security profiles at the namespace level:

  • Privileged — No restrictions. For system components only.
  • Baseline — Prevents the most critical privilege escalations: privileged containers, hostPID, hostIPC, hostNetwork, dangerous capabilities.
  • Restricted — Enforces current hardening best practices. Requires running as non-root, dropping all capabilities, and using a restricted seccomp profile.

Enable enforcement at the namespace level with labels:

apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/enforce-version: v1.30
    pod-security.kubernetes.io/warn: restricted
    pod-security.kubernetes.io/warn-version: v1.30
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/audit-version: v1.30

A pod that runs as root or requests host-network in a namespace enforcing restricted will be rejected at admission. The warn and audit modes let you test before enforcing.

PSA covers the most critical pod-level escalations, but it is coarse-grained. For fine-grained policy control, use Kyverno alongside PSA.

3. Network Policies: Micro-Segmentation

By default, every pod in a Kubernetes cluster can communicate with every other pod across all namespaces. This is a flat network model that gives attackers unrestricted lateral movement once they compromise any workload.

Network Policies define L3/L4 allow-rules for pod-to-pod communication. They are enforced by your CNI plugin (Calico, Cilium, Weave — not Flannel, which does not support NetworkPolicy).

Default Deny Pattern

Start by denying all ingress and egress in every namespace, then open only what is explicitly needed:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

Then allow specific traffic:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-api-to-db
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: postgres
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: api
    ports:
    - protocol: TCP
      port: 5432

Do not forget DNS egress — most workloads need to resolve names via kube-dns, which requires UDP port 53 egress to the kube-system namespace.

4. Secrets Management

Kubernetes Secrets are base64-encoded, not encrypted. Stored in etcd in plain text by default. Anyone with get permission on Secrets can read them. This is not a vulnerability — it is a design decision that puts the responsibility on you to:

  • Enable encryption at rest for etcd. Configure EncryptionConfiguration with an AES-CBC or AES-GCM provider. This encrypts Secrets before they are written to etcd.
  • Use external secret stores. HashiCorp Vault, AWS Secrets Manager, or Azure Key Vault with the External Secrets Operator means actual secret values never live in Kubernetes at all.
  • Restrict Secret RBAC aggressively. Never give list on Secrets cluster-wide — it returns all values. Use get on named resources where possible.
  • Avoid environment variables for secrets. Prefer volume mounts. Env vars are visible in pod inspect output and can leak through application logging.
# etcd encryption at rest - in kube-apiserver config
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
- resources:
  - secrets
  providers:
  - aescbc:
      keys:
      - name: key1
        secret: <base64-encoded-32-byte-key>
  - identity: {}

5. Image Security and Supply Chain

Your runtime security posture is only as good as the images you run. A compromised image from a public registry bypasses every runtime control you have.

Scan images in CI

Use Trivy, Grype, or Snyk to scan images as part of your CI pipeline. Block deployments of images with critical CVEs:

# In your CI pipeline
trivy image --exit-code 1 --severity CRITICAL your-image:tag

Use a private registry with admission control

Only allow images from your private registry using an admission webhook (Kyverno, OPA Gatekeeper). This prevents developers from running arbitrary public images in production:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: restrict-image-registries
spec:
  validationFailureAction: Enforce
  rules:
  - name: validate-registries
    match:
      any:
      - resources:
          kinds: ["Pod"]
    validate:
      message: "Images must come from registry.company.com"
      pattern:
        spec:
          containers:
          - image: "registry.company.com/*"

Use distroless or minimal base images

Distroless images contain only the application and its runtime dependencies — no shell, no package manager, no debugging tools. This drastically reduces the attack surface and the number of CVEs. Google’s distroless images are available for Java, Node.js, Python, and Go.

Sign and verify images

Cosign (from the Sigstore project) lets you sign container images and verify signatures at admission time using Kyverno or Connaisseur. This prevents image substitution attacks where an attacker replaces a legitimate image in your registry.

6. Runtime Security

Runtime security detects and responds to malicious activity after a container is running. The primary tool in this space is Falco — a CNCF project that uses eBPF to monitor system calls and raise alerts when containers behave unexpectedly.

Default Falco rules catch common attack patterns:

  • Shell spawned in a container
  • Network connection to an unexpected IP
  • Write to a sensitive file path (/etc/passwd, /etc/shadow)
  • Privilege escalation via setuid binaries
  • Container drift (new executable files written at runtime)

Combine Falco with seccomp profiles to restrict the system calls a container can make at the kernel level. The RuntimeDefault seccomp profile (available since Kubernetes 1.27 as a default) blocks 300+ system calls that containers virtually never need.

spec:
  securityContext:
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: app
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      runAsNonRoot: true
      runAsUser: 65534
      capabilities:
        drop: ["ALL"]

These four securityContext settings together (allowPrivilegeEscalation: false, readOnlyRootFilesystem: true, runAsNonRoot: true, capabilities.drop: ALL) make container escape significantly harder and satisfy the Kubernetes Restricted pod security standard.

7. API Server Hardening

The API server is the most critical component to harden. Key settings:

  • Disable anonymous authentication. --anonymous-auth=false ensures every request is authenticated.
  • Enable audit logging. Log all API server requests to a file or webhook. Without audit logs, you cannot investigate incidents or detect RBAC abuse.
  • Restrict admission plugins. Ensure NodeRestriction is enabled — it prevents node kubelets from modifying objects outside their own node.
  • Do not expose the API server to the internet. Use a VPN, bastion host, or private endpoint. If you must expose it, restrict access by IP.
# Minimal audit policy - log all requests at metadata level,
# and full request body for sensitive resources
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: RequestResponse
  resources:
  - group: ""
    resources: ["secrets", "configmaps"]
- level: Metadata
  omitStages: ["RequestReceived"]

8. etcd Security

etcd stores all cluster state. Treat it as sensitive as your production database:

  • Enable TLS for all etcd communication. Both peer communication (etcd-to-etcd) and client communication (apiserver-to-etcd) must use mutual TLS.
  • Restrict network access to etcd. etcd should only be reachable by the API server. Use firewall rules or security groups to enforce this.
  • Enable encryption at rest. As described in the Secrets section above.
  • Backup etcd regularly. An etcd snapshot is a complete copy of all cluster state, including all Secrets. Encrypt backups and store them separately from the cluster.

9. CIS Kubernetes Benchmark

The CIS Kubernetes Benchmark is a comprehensive checklist of security controls covering the control plane, nodes, and workloads. Running kube-bench against your cluster gives you a scored assessment against the CIS controls:

kubectl apply -f https://raw.githubusercontent.com/aquasecurity/kube-bench/main/job.yaml
kubectl logs $(kubectl get pods -l app=kube-bench -o name)

kube-bench outputs PASS/FAIL/WARN for each control with remediation guidance. Run it after initial cluster setup and after major configuration changes.

10. Continuous Security Posture with Kubescape

Kubescape and similar tools (Starboard/Trivy Operator, KubeScore) provide continuous security scanning of live cluster state — not just a one-time audit. They check workloads against NSA/CISA hardening guidelines, MITRE ATT&CK framework, and the CIS benchmark in real time.

Deploy Trivy Operator for continuous in-cluster scanning:

helm repo add aquasecurity https://aquasecurity.github.io/helm-charts/
helm install trivy-operator aquasecurity/trivy-operator 
  --namespace trivy-system 
  --create-namespace 
  --set="trivy.ignoreUnfixed=true"

Trivy Operator creates VulnerabilityReport, ConfigAuditReport, and RbacAssessmentReport custom resources in the same namespace as each workload. These can be scraped by Prometheus and displayed in Grafana for a security dashboard.

Security Hardening Checklist

  • ✅ RBAC reviewed — no wildcard roles, no unnecessary cluster-admin bindings
  • ✅ ServiceAccount token automount disabled for workloads that do not need API access
  • ✅ Pod Security Standards enforced at namespace level (at least Baseline, Restricted where possible)
  • ✅ Network policies deployed — default deny with explicit allows
  • ✅ Secrets encrypted at rest in etcd
  • ✅ Images scanned in CI — no critical CVEs in production
  • ✅ Private registry enforced via admission control
  • ✅ Container securityContext hardened (non-root, read-only filesystem, no capabilities)
  • ✅ seccomp RuntimeDefault profile enabled
  • ✅ API server audit logging enabled
  • ✅ etcd TLS and network access restricted
  • ✅ kube-bench run and critical/high findings remediated
  • ✅ Runtime security (Falco) deployed and alerts routed to on-call
  • ✅ Continuous scanning (Trivy Operator or Kubescape) deployed

FAQ

Where do I start if my cluster has no security controls today?

Start with the highest-impact, lowest-effort controls first: audit your RBAC (revoke cluster-admin where not needed), enable Pod Security Admission in warn mode on all namespaces, and deploy Trivy Operator. These three steps give you immediate visibility and prevent the most common privilege escalations without breaking anything.

Does enabling Network Policies break DNS resolution?

Yes, if you deploy a default-deny egress policy without explicitly allowing DNS. Add an egress rule allowing UDP port 53 to the kube-dns service in kube-system when applying default-deny network policies.

Is Kubernetes certified for PCI-DSS or SOC 2?

Kubernetes itself is not certified — your configuration and the controls you implement determine compliance. The CIS Kubernetes Benchmark maps to many PCI-DSS and SOC 2 requirements. Managed Kubernetes offerings (EKS, GKE, AKS) have their own compliance certifications for the underlying infrastructure.

Should I use OPA Gatekeeper or Kyverno?

Both enforce admission policies, but Kyverno is Kubernetes-native (policies are written as YAML) while Gatekeeper uses Rego (a purpose-built policy language). For teams without Rego expertise, Kyverno is significantly faster to adopt and maintain. For teams already using OPA elsewhere in their stack, Gatekeeper offers consistency. Both integrate well with GitOps workflows.

How often should I update Kubernetes for security patches?

Follow a patch release within 30 days of release for CVEs rated High or Critical. Minor version upgrades (e.g., 1.29 → 1.30) should happen within the support window — Kubernetes maintains the last three minor versions. Falling more than one minor version behind means running without security patches for a growing subset of the codebase.

For a deeper look at how security fits into the broader Kubernetes platform architecture, see the Kubernetes architecture patterns guide and the guide on building a security-first Kubernetes culture.

ArgoCD Guide: GitOps Continuous Delivery for Kubernetes

ArgoCD Guide: GitOps Continuous Delivery for Kubernetes

ArgoCD has become the de facto standard for GitOps-based continuous delivery in Kubernetes. If you are running production workloads on Kubernetes and still deploying with raw kubectl apply or untracked Helm releases, ArgoCD solves a class of problems you may not even know you have yet. This guide covers everything from core concepts to production-grade configuration.

The Problem ArgoCD Solves

Traditional CI/CD pushes deployments into a cluster. A CI system runs tests, builds an image, and then executes kubectl apply or helm upgrade against the cluster. This model has several structural problems:

  • Drift goes undetected. Someone applies a hotfix directly to the cluster. Now your Git repository no longer reflects reality, and nobody knows it.
  • No single source of truth. The cluster state is authoritative, not Git. Your desired state and actual state can diverge silently.
  • Rollback is painful. Rolling back a bad deployment means re-running old CI pipelines or manually reversing changes, neither of which is fast.
  • Multi-cluster management compounds the problem. Each cluster becomes a snowflake with its own history of undocumented changes.

GitOps inverts this model. Git is the source of truth. The cluster pulls its desired state from Git and continuously reconciles toward it. ArgoCD is the most mature GitOps operator for Kubernetes, implementing this pull-based model with a production-ready feature set.

How ArgoCD Works: Core Architecture

ArgoCD runs as a set of controllers inside your Kubernetes cluster. The core components are:

  • Application Controller — Watches both the Git repository and the live cluster state. Computes the diff and drives reconciliation.
  • API Server — Exposes the gRPC/REST API consumed by the CLI, UI, and external systems.
  • Repository Server — Generates Kubernetes manifests from source (Helm, Kustomize, plain YAML, Jsonnet).
  • Redis — Caches cluster state and repository data to reduce API server load.
  • Dex (optional) — Provides OIDC authentication for SSO integration.

The fundamental unit in ArgoCD is an Application — a CRD that maps a source (a path in a Git repo at a specific revision) to a destination (a namespace in a cluster). ArgoCD continuously compares the desired state from Git with the live state in the cluster and reports on the sync status.

Sync Status vs Health Status

Two orthogonal concepts you need to understand from day one:

  • Sync Status — Does the live state match what Git says it should be? Values: Synced, OutOfSync, Unknown.
  • Health Status — Is the application actually working? Values: Healthy, Progressing, Degraded, Suspended, Missing, Unknown.

An application can be Synced but Degraded — the manifests were applied correctly, but a pod is crash-looping. Conversely, it can be OutOfSync but Healthy — someone applied a change directly to the cluster outside of Git.

Installing ArgoCD

The official installation method uses a single manifest. For production, always pin to a specific version:

kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/v2.11.0/manifests/install.yaml

This deploys ArgoCD in the argocd namespace with full cluster-admin access. For a production HA setup, use the manifests/ha/install.yaml variant, which deploys multiple replicas of the API server and application controller.

Accessing the UI and CLI

The initial admin password is auto-generated and stored in a secret:

argocd admin initial-password -n argocd

For local access, port-forward the API server:

kubectl port-forward svc/argocd-server -n argocd 8080:443

Then log in via the CLI:

argocd login localhost:8080 --username admin --password <password> --insecure

For production, expose the ArgoCD server via an Ingress or LoadBalancer with a proper TLS certificate. If you’re using NGINX Ingress Controller:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: argocd-server-ingress
  namespace: argocd
  annotations:
    nginx.ingress.kubernetes.io/ssl-passthrough: "true"
    nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
spec:
  ingressClassName: nginx
  rules:
  - host: argocd.yourdomain.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: argocd-server
            port:
              number: 443

Defining Your First Application

Applications can be created via the UI, the CLI, or declaratively with a YAML manifest. The declarative approach is the recommended one — it means your ArgoCD configuration itself is in Git:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: my-app
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/your-org/your-app
    targetRevision: HEAD
    path: k8s/overlays/production
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
    - CreateNamespace=true

Key fields to understand:

  • targetRevision — Can be a branch name, tag, or commit SHA. For production, pin to a tag rather than HEAD.
  • path — The directory within the repo containing your Kubernetes manifests.
  • automated.prune — Automatically delete resources that are no longer in Git. Required for true GitOps but use carefully — it will delete things.
  • automated.selfHeal — Automatically revert manual changes made directly to the cluster. This is what enforces Git as the single source of truth.

Helm Integration

ArgoCD has native Helm support. It can deploy Helm charts directly from chart repositories or from your Git repository. You can override values per environment:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: prometheus-stack
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://prometheus-community.github.io/helm-charts
    chart: kube-prometheus-stack
    targetRevision: 58.4.0
    helm:
      releaseName: prometheus-stack
      valuesObject:
        grafana:
          adminPassword: "${GRAFANA_PASSWORD}"
        prometheus:
          prometheusSpec:
            retention: 30d
            storageSpec:
              volumeClaimTemplate:
                spec:
                  storageClassName: fast-ssd
                  resources:
                    requests:
                      storage: 50Gi
  destination:
    server: https://kubernetes.default.svc
    namespace: observability

One important nuance: ArgoCD renders Helm charts server-side using its own templating engine, not helm install. This means Helm hooks (pre-install, post-upgrade, etc.) are supported, but the release is not tracked in Helm’s release history. Running helm list will not show ArgoCD-managed releases unless you configure ArgoCD to use the Helm secrets backend.

Projects: Multi-Tenancy and Access Control

ArgoCD Projects provide multi-tenancy within a single ArgoCD instance. They let you restrict which source repositories, destination clusters, and namespaces a team can deploy to. Every Application belongs to a Project.

apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
  name: platform-team
  namespace: argocd
spec:
  description: Platform team applications
  sourceRepos:
  - 'https://github.com/your-org/*'
  destinations:
  - namespace: 'platform-*'
    server: https://kubernetes.default.svc
  clusterResourceWhitelist:
  - group: ''
    kind: Namespace
  namespaceResourceBlacklist:
  - group: ''
    kind: ResourceQuota

Projects are where you define the boundaries of what each team can do. The default project has no restrictions — never use it for production workloads. Create dedicated projects per team or per environment.

RBAC Configuration

ArgoCD has its own RBAC system layered on top of Kubernetes RBAC. It is configured via the argocd-rbac-cm ConfigMap. Roles are defined per project or globally:

apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-rbac-cm
  namespace: argocd
data:
  policy.default: role:readonly
  policy.csv: |
    # Platform team has full access to platform-team project
    p, role:platform-admin, applications, *, platform-team/*, allow
    p, role:platform-admin, projects, get, platform-team, allow
    p, role:platform-admin, repositories, *, *, allow

    # Dev team can sync but not delete
    p, role:developer, applications, get, */*, allow
    p, role:developer, applications, sync, */*, allow
    p, role:developer, applications, action/*, */*, allow

    # Bind SSO groups to roles
    g, your-org:platform-team, role:platform-admin
    g, your-org:developers, role:developer

The policy.default: role:readonly ensures that any authenticated user who has no explicit role assignment gets read-only access — a safe default for production.

Multi-Cluster Management

ArgoCD can manage multiple Kubernetes clusters from a single control plane. Register external clusters with the CLI:

# First, ensure the target cluster context is in your kubeconfig
argocd cluster add production-eu-west --name production-eu-west

# Verify registration
argocd cluster list

ArgoCD will create a ServiceAccount in the target cluster and store its credentials as a Kubernetes secret in the ArgoCD namespace. Applications can then target this cluster by name in their destination.server field.

For large-scale multi-cluster setups, consider the App of Apps pattern or ApplicationSets. ApplicationSets are a controller that generates Applications dynamically based on generators — cluster lists, Git directory structures, or matrix combinations:

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: cluster-addons
  namespace: argocd
spec:
  generators:
  - clusters:
      selector:
        matchLabels:
          environment: production
  template:
    metadata:
      name: '{{name}}-addons'
    spec:
      project: platform
      source:
        repoURL: https://github.com/your-org/cluster-addons
        targetRevision: HEAD
        path: 'addons/{{metadata.labels.region}}'
      destination:
        server: '{{server}}'
        namespace: kube-system

This single ApplicationSet deploys the appropriate addons to every cluster labeled environment: production, using each cluster’s region label to select the correct path in the repository.

Sync Strategies and Waves

When deploying complex applications with dependencies between resources, you need to control the order of deployment. ArgoCD provides two mechanisms:

Sync Phases

Resources are deployed in three phases: PreSync, Sync, and PostSync. Use Sync Hooks for resources that must complete before the main sync proceeds (database migrations, certificate issuance, etc.):

apiVersion: batch/v1
kind: Job
metadata:
  name: db-migration
  annotations:
    argocd.argoproj.io/hook: PreSync
    argocd.argoproj.io/hook-delete-policy: HookSucceeded
spec:
  template:
    spec:
      containers:
      - name: migrate
        image: your-app:v1.2.3
        command: ["./migrate.sh"]
      restartPolicy: Never

Sync Waves

Within the Sync phase, waves control ordering. Resources with a lower wave number are applied and must become healthy before resources with higher wave numbers are applied:

# Applied first
metadata:
  annotations:
    argocd.argoproj.io/sync-wave: "1"

# Applied after wave 1 is healthy
metadata:
  annotations:
    argocd.argoproj.io/sync-wave: "2"

Notifications and Alerting

ArgoCD Notifications is a standalone controller that sends alerts when Application state changes. It supports Slack, PagerDuty, GitHub commit status, email, and a dozen other providers. Configure it via the argocd-notifications-cm ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-notifications-cm
  namespace: argocd
data:
  service.slack: |
    token: $slack-token
  template.app-sync-failed: |
    slack:
      attachments: |
        [{
          "title": "{{.app.metadata.name}}",
          "color": "#E96D76",
          "fields": [{
            "title": "Sync Status",
            "value": "{{.app.status.sync.status}}",
            "short": true
          },{
            "title": "Message",
            "value": "{{range .app.status.conditions}}{{.message}}{{end}}",
            "short": false
          }]
        }]
  trigger.on-sync-failed: |
    - when: app.status.sync.status == 'Unknown'
      send: [app-sync-failed]
    - when: app.status.operationState.phase in ['Error', 'Failed']
      send: [app-sync-failed]

Secret Management with ArgoCD

ArgoCD intentionally has no secret management built in — storing secrets in Git as plain text is never acceptable. The common patterns are:

  • Sealed Secrets (Bitnami) — Encrypts secrets with a cluster-specific key. The encrypted secret can be committed to Git; only the cluster can decrypt it.
  • External Secrets Operator — Syncs secrets from Vault, AWS Secrets Manager, GCP Secret Manager, etc. into Kubernetes secrets. The ArgoCD Application manages the ExternalSecret CRD, not the actual secret value.
  • argocd-vault-plugin — A plugin that replaces placeholder values in manifests with secrets retrieved from Vault at sync time.

The External Secrets Operator approach is the most flexible for teams already using a centralized secrets backend. The Application in ArgoCD deploys ExternalSecret objects, which the ESO controller resolves at runtime without ever touching Git.

Production Best Practices

  • Run ArgoCD in HA mode. Use manifests/ha/install.yaml with 3 replicas of the API server and multiple application controller shards for large clusters (100+ applications).
  • Pin image versions. Never use latest for the ArgoCD image itself. Pin to a specific version and upgrade deliberately.
  • Use the App of Apps pattern for bootstrapping. A single root Application deploys all other Applications. This makes cluster bootstrapping idempotent and reproducible.
  • Separate ArgoCD config from application config. Store ArgoCD Application manifests in a dedicated gitops repository, separate from application source code.
  • Enable resource tracking via annotations. Use application.resourceTrackingMethod: annotation in argocd-cm instead of the default label-based tracking, which can conflict with Helm’s own labels.
  • Set resource limits on ArgoCD controllers. Application controller CPU and memory scale with the number of resources tracked. Monitor and tune accordingly.
  • Restrict auto-sync in production. Consider requiring manual sync approval for production environments even when using GitOps — or at minimum require a PR approval gate before changes reach the target branch.

ArgoCD vs Flux

Flux v2 is the other major GitOps operator. Both are CNCF projects. The main differences in practice:

FeatureArgoCDFlux v2
UIBuilt-in web UINo official UI (use Weave GitOps)
Multi-clusterSingle control plane manages many clustersAgent per cluster, pull model
ApplicationSetsNativeKustomization + HelmRelease
Secret managementPlugin-basedSOPS native integration
Learning curveSteeper (more concepts)Lower (Kubernetes-native CRDs)
CNCF statusGraduatedGraduated

ArgoCD wins when you need the UI, multi-cluster management from a central plane, or have a large operations team that benefits from the visual application topology view. Flux wins when you want a simpler, purely Kubernetes-native approach with better SOPS integration for secret management.

FAQ

Can ArgoCD deploy to the cluster it runs in?

Yes. The https://kubernetes.default.svc destination refers to the local cluster. ArgoCD can manage both its own cluster and external clusters simultaneously.

Does ArgoCD support private Git repositories?

Yes. Configure repository credentials via argocd repo add with SSH keys, HTTPS username/password, or GitHub App credentials. Credentials are stored as Kubernetes secrets in the ArgoCD namespace.

How does ArgoCD handle CRD installation?

CRDs can be managed by ArgoCD, but there is a chicken-and-egg problem: if a CRD is not yet installed, ArgoCD cannot validate resources that use it. The recommended pattern is to put CRDs in wave 1 and dependent resources in wave 2, or to use a separate Application for CRDs.

What is the difference between an Application and an AppProject?

An Application is the unit of deployment — it maps a Git source to a cluster destination. An AppProject is a grouping and access control boundary — it restricts what sources and destinations an Application within the project can use. Every Application belongs to exactly one AppProject.

How do I roll back a deployment with ArgoCD?

The GitOps way: revert the commit in Git and let ArgoCD reconcile. ArgoCD also provides a UI-based rollback to any previous sync revision, but this is considered a temporary measure — the Git history should always be updated to match.

Getting Started

The fastest path from zero to a working ArgoCD setup on a local cluster:

# 1. Create a local cluster (kind or minikube)
kind create cluster --name argocd-demo

# 2. Install ArgoCD
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

# 3. Wait for pods
kubectl wait --for=condition=Ready pods --all -n argocd --timeout=120s

# 4. Get the initial admin password
argocd admin initial-password -n argocd

# 5. Port-forward and log in
kubectl port-forward svc/argocd-server -n argocd 8080:443 &
argocd login localhost:8080 --username admin --insecure

# 6. Deploy your first application
argocd app create guestbook 
  --repo https://github.com/argoproj/argocd-example-apps.git 
  --path guestbook 
  --dest-server https://kubernetes.default.svc 
  --dest-namespace guestbook 
  --sync-policy automated

From here, the natural next steps are integrating ArgoCD with your existing CI pipeline (CI builds and pushes the image, updates the image tag in Git, ArgoCD detects the change and syncs), configuring SSO via Dex, and setting up the App of Apps pattern for managing multiple applications declaratively.

For teams looking to go deeper on GitOps and ArgoCD in production, the Kubernetes architecture patterns guide covers how ArgoCD fits into a broader platform engineering stack alongside service mesh, policy enforcement, and observability tooling.

Istio ServiceEntry Explained: External Services, DNS, and Traffic Control

Istio ServiceEntry Explained: External Services, DNS, and Traffic Control

Every production Kubernetes cluster talks to the outside world. Your services call payment APIs, connect to managed databases, push events to SaaS analytics platforms, and reach legacy systems that will never run inside the mesh. By default, Istio lets all outbound traffic flow freely — or blocks it entirely if you flip outboundTrafficPolicy to REGISTRY_ONLY. Neither extreme gives you what you actually need: selective, observable, policy-controlled access to external services.

That is exactly what Istio ServiceEntry solves. It registers external endpoints in the mesh’s internal service registry so that Envoy sidecars can apply the same traffic management, security, and observability features to outbound calls that you already enjoy for east-west traffic. No new proxies, no egress gateways required for the basic case — just a YAML resource that tells the mesh “this external thing exists, and here is how to reach it.”

In this guide, I will walk through every field of the ServiceEntry spec, explain the four DNS resolution modes with real-world use cases, and show production-ready patterns for external APIs, databases, TCP services, and legacy workloads. We will also cover how to combine ServiceEntry with DestinationRule and VirtualService to get circuit breaking, retries, connection pooling, and even sticky sessions for external dependencies.

What Is a ServiceEntry

Istio maintains an internal service registry that merges Kubernetes Services with any additional entries you declare. When a sidecar proxy needs to decide how to route a request, it consults this registry. Services inside the mesh are automatically registered. Services outside the mesh are not — unless you create a ServiceEntry.

A ServiceEntry is a custom resource that adds an entry to the mesh’s service registry. Once registered, the external service becomes a first-class citizen: Envoy generates clusters, routes, and listeners for it, which means you get metrics (istio_requests_total), access logs, distributed traces, mTLS origination, retries, timeouts, circuit breaking — the full Istio feature set.

Without a ServiceEntry, outbound traffic to an external host either passes through as a raw TCP connection (in ALLOW_ANY mode) with no telemetry, or gets dropped with a 502/503 (in REGISTRY_ONLY mode). Both outcomes are undesirable in production. The ServiceEntry bridges that gap.

ServiceEntry Anatomy: All Fields Explained

Let us look at a complete ServiceEntry and then break down each field.

apiVersion: networking.istio.io/v1
kind: ServiceEntry
metadata:
  name: external-api
  namespace: production
spec:
  hosts:
    - api.stripe.com
  location: MESH_EXTERNAL
  ports:
    - number: 443
      name: https
      protocol: TLS
  resolution: DNS
  exportTo:
    - "."
    - "istio-system"

hosts

A list of hostnames associated with the service. For external services, this is typically the DNS name your application uses (e.g., api.stripe.com). For services using HTTP protocols, the hosts field is matched against the HTTP Host header. For non-HTTP protocols and services without a DNS name, you can use a synthetic hostname and pair it with addresses or static endpoints.

addresses

Optional virtual IP addresses associated with the service. Useful for TCP services where you want to assign a VIP that the sidecar will intercept. Not required for HTTP/HTTPS services that use hostname-based routing.

ports

The ports on which the external service is exposed. Each port needs a number, name, and protocol. The protocol matters: setting it to TLS tells Envoy to perform SNI-based routing without terminating TLS. Setting it to HTTPS means HTTP over TLS. For databases, you’ll typically use TCP.

location

MESH_EXTERNAL or MESH_INTERNAL. Use MESH_EXTERNAL for services outside your cluster (third-party APIs, managed databases). Use MESH_INTERNAL for services inside your infrastructure that are not part of the mesh — for example, VMs running in the same VPC that do not have a sidecar, or a Kubernetes Service in a namespace without injection enabled. The location affects how mTLS is applied and how metrics are labeled.

resolution

How the sidecar resolves the endpoint addresses. This is the most critical field and I will dedicate the next section to it. Options: NONE, STATIC, DNS, DNS_ROUND_ROBIN.

endpoints

An explicit list of network endpoints. Required when resolution is STATIC. Optional with DNS resolution to provide labels or locality information. Each endpoint can have an address, ports, labels, network, locality, and weight.

exportTo

Controls the visibility of this ServiceEntry across namespaces. Use "." for the current namespace only, "*" for all namespaces. In multi-team clusters, restrict exports to avoid namespace pollution.

Resolution Types: NONE vs STATIC vs DNS vs DNS_ROUND_ROBIN

The resolution field determines how Envoy discovers the IP addresses behind the service. Getting this wrong is the number one cause of ServiceEntry misconfigurations. Here is a clear breakdown.

ResolutionHow It WorksBest For
NONEEnvoy uses the original destination IP from the connection. No DNS lookup by the proxy.Wildcard entries, pass-through scenarios, services where the application already resolved the IP.
STATICEnvoy routes to the IPs listed in the endpoints field. No DNS involved.Services with stable, known IPs (e.g., on-prem databases, VMs with fixed IPs).
DNSEnvoy resolves the hostname at connection time and creates an endpoint per returned IP. Uses async DNS with health checking per IP.External APIs behind load balancers, managed databases with DNS endpoints (RDS, CloudSQL).
DNS_ROUND_ROBINEnvoy resolves the hostname and uses a single logical endpoint, rotating across returned IPs. No per-IP health checking.Simple external services, services where you do not need per-endpoint circuit breaking.

When to Use NONE

Use NONE when you want to register a range of external IPs or wildcard hosts without Envoy performing any address resolution. This is common for broad egress policies: “allow traffic to *.googleapis.com on port 443.” Envoy will simply forward traffic to whatever IP the application resolved via kube-dns. The downside: Envoy has limited ability to apply per-endpoint policies.

When to Use STATIC

Use STATIC when the external service has known, stable IP addresses that rarely change. This avoids DNS dependencies entirely. You define the IPs in the endpoints list. Classic use case: a legacy Oracle database on a fixed IP in your data center.

When to Use DNS

Use DNS for most external API integrations. Envoy performs asynchronous DNS resolution and creates a cluster endpoint for each returned IP address. This enables per-endpoint health checking and circuit breaking — critical for production reliability. This is the mode you want for services like api.stripe.com or your RDS instance endpoint.

When to Use DNS_ROUND_ROBIN

Use DNS_ROUND_ROBIN when the external hostname returns many IPs and you do not need per-IP circuit breaking. Envoy treats all resolved IPs as a single logical endpoint and round-robins across them. This is lighter weight than DNS mode and avoids creating a large number of endpoints in Envoy’s cluster configuration.

Practical Patterns

Pattern 1: External HTTP API (api.stripe.com)

The most common ServiceEntry pattern. Your application calls a third-party HTTPS API. You want Istio telemetry, and optionally retries and timeouts.

apiVersion: networking.istio.io/v1
kind: ServiceEntry
metadata:
  name: stripe-api
  namespace: payments
spec:
  hosts:
    - api.stripe.com
  location: MESH_EXTERNAL
  ports:
    - number: 443
      name: tls
      protocol: TLS
  resolution: DNS

Note the protocol is TLS, not HTTPS. Since your application initiates the TLS handshake directly, Envoy handles this as opaque TLS using SNI-based routing. If you were terminating TLS at the sidecar and doing TLS origination via a DestinationRule, you would set the protocol to HTTP and handle the upgrade separately — but for most external APIs, let the application manage its own TLS.

Pattern 2: External Managed Database (RDS / CloudSQL)

Managed databases expose a DNS endpoint that resolves to one or more IPs. During failover, the DNS record changes. You need Envoy to respect DNS TTLs and route to the current primary.

apiVersion: networking.istio.io/v1
kind: ServiceEntry
metadata:
  name: orders-database
  namespace: orders
spec:
  hosts:
    - orders-db.abc123.us-east-1.rds.amazonaws.com
  location: MESH_EXTERNAL
  ports:
    - number: 5432
      name: postgres
      protocol: TCP
  resolution: DNS

For TCP services, Envoy cannot use HTTP headers to route, so it relies on IP-based matching. The DNS resolution mode ensures Envoy periodically re-resolves the hostname and updates its endpoint list. This is critical for RDS multi-AZ failover scenarios where the DNS endpoint flips to a new IP.

Pattern 3: Legacy Internal Service Not in the Mesh

You have a monitoring service running on a set of VMs at known IP addresses inside your VPC. It is not part of the mesh, but your meshed services need to talk to it.

apiVersion: networking.istio.io/v1
kind: ServiceEntry
metadata:
  name: legacy-monitoring
  namespace: observability
spec:
  hosts:
    - legacy-monitoring.internal
  location: MESH_INTERNAL
  ports:
    - number: 8080
      name: http
      protocol: HTTP
  resolution: STATIC
  endpoints:
    - address: 10.0.5.10
    - address: 10.0.5.11
    - address: 10.0.5.12

Key differences: location is MESH_INTERNAL because the service lives inside your network, and resolution is STATIC because we know the IPs. The hostname legacy-monitoring.internal is synthetic — your application uses it, and Istio’s DNS proxy (or a CoreDNS entry) resolves it to one of the listed endpoints.

Pattern 4: TCP Services with Multiple Ports

Some external services expose multiple TCP ports — for example, an Elasticsearch cluster with both data (9200) and transport (9300) ports.

apiVersion: networking.istio.io/v1
kind: ServiceEntry
metadata:
  name: external-elasticsearch
  namespace: search
spec:
  hosts:
    - es.example.com
  location: MESH_EXTERNAL
  ports:
    - number: 9200
      name: http
      protocol: HTTP
    - number: 9300
      name: transport
      protocol: TCP
  resolution: DNS

Each port gets its own Envoy listener configuration. The HTTP port benefits from full Layer 7 telemetry and traffic management. The TCP port gets Layer 4 metrics and connection-level policies.

Combining ServiceEntry with DestinationRule

A ServiceEntry alone registers the external service. To apply traffic policies — connection pooling, circuit breaking, TLS origination, load balancing — you pair it with a DestinationRule. This is where things get powerful.

Connection Pooling and Circuit Breaking

External APIs have rate limits. Your managed database has a maximum connection count. Protecting these dependencies at the mesh level prevents cascading failures.

apiVersion: networking.istio.io/v1
kind: ServiceEntry
metadata:
  name: stripe-api
  namespace: payments
spec:
  hosts:
    - api.stripe.com
  location: MESH_EXTERNAL
  ports:
    - number: 443
      name: tls
      protocol: TLS
  resolution: DNS
---
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: stripe-api-dr
  namespace: payments
spec:
  host: api.stripe.com
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 50
        connectTimeout: 5s
      http:
        h2UpgradePolicy: DO_NOT_UPGRADE
        maxRequestsPerConnection: 100
    outlierDetection:
      consecutive5xxErrors: 3
      interval: 30s
      baseEjectionTime: 60s
      maxEjectionPercent: 100

This configuration caps outbound connections to Stripe at 50, sets a 5-second connection timeout, and ejects endpoints that return 3 consecutive 5xx errors. In production, this prevents a degraded third-party API from consuming all your connection slots and causing a domino effect across your services.

TLS Origination

Sometimes your application speaks plain HTTP, but the external service requires HTTPS. Instead of modifying application code, you can offload TLS origination to the sidecar.

apiVersion: networking.istio.io/v1
kind: ServiceEntry
metadata:
  name: external-api
  namespace: default
spec:
  hosts:
    - api.external-service.com
  location: MESH_EXTERNAL
  ports:
    - number: 80
      name: http
      protocol: HTTP
    - number: 443
      name: https
      protocol: TLS
  resolution: DNS
---
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: external-api-tls
  namespace: default
spec:
  host: api.external-service.com
  trafficPolicy:
    portLevelSettings:
      - port:
          number: 443
        tls:
          mode: SIMPLE

Your application sends HTTP to port 80. A VirtualService (shown in the next section) redirects that to port 443. The DestinationRule initiates TLS to the external endpoint. The application never knows TLS happened.

Combining ServiceEntry with VirtualService

VirtualService gives you Layer 7 traffic management for external services: retries, timeouts, fault injection, header-based routing, and traffic shifting. This is invaluable when you are migrating between API providers or need resilience policies for unreliable external dependencies.

Retries and Timeouts

apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: stripe-api-vs
  namespace: payments
spec:
  hosts:
    - api.stripe.com
  http:
    - route:
        - destination:
            host: api.stripe.com
            port:
              number: 443
      timeout: 10s
      retries:
        attempts: 3
        perTryTimeout: 3s
        retryOn: connect-failure,refused-stream,unavailable,cancelled,retriable-status-codes
        retryRemoteLocalities: true

This applies a 10-second overall timeout with up to 3 retry attempts (3 seconds each) for specific failure conditions. Note that this only works for HTTP-protocol ServiceEntries. For TLS-protocol entries where Envoy cannot see the HTTP layer, you are limited to TCP-level connection retries configured via the DestinationRule.

Traffic Shifting Between External Providers

Migrating from one external API to another? Use weighted routing to shift traffic gradually.

apiVersion: networking.istio.io/v1
kind: ServiceEntry
metadata:
  name: geocoding-primary
  namespace: geo
spec:
  hosts:
    - geocoding.internal
  location: MESH_EXTERNAL
  ports:
    - number: 443
      name: tls
      protocol: TLS
  resolution: STATIC
  endpoints:
    - address: api.old-geocoding-provider.com
      labels:
        provider: old
    - address: api.new-geocoding-provider.com
      labels:
        provider: new
---
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: geocoding-dr
  namespace: geo
spec:
  host: geocoding.internal
  trafficPolicy:
    tls:
      mode: SIMPLE
  subsets:
    - name: old-provider
      labels:
        provider: old
    - name: new-provider
      labels:
        provider: new
---
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: geocoding-vs
  namespace: geo
spec:
  hosts:
    - geocoding.internal
  http:
    - route:
        - destination:
            host: geocoding.internal
            subset: old-provider
          weight: 80
        - destination:
            host: geocoding.internal
            subset: new-provider
          weight: 20

This sends 80% of geocoding traffic to the old provider and 20% to the new one. Adjust the weights as you gain confidence. Fully reversible — just set the old provider back to 100%.

DNS Resolution Patterns: Istio DNS Proxy vs kube-dns

Istio DNS resolution for external services involves two layers: how your application resolves the hostname (kube-dns / CoreDNS), and how the sidecar resolves the hostname (Envoy’s async DNS or Istio’s DNS proxy). Understanding the interplay is crucial for reliable Istio DNS behavior.

Default Flow (Without Istio DNS Proxy)

Your application calls api.stripe.com. kube-dns resolves it to an IP. The application opens a connection to that IP. The sidecar intercepts the connection and — if the ServiceEntry uses DNS resolution — Envoy independently resolves api.stripe.com to determine its endpoint list. Two separate DNS lookups happen, which can lead to inconsistencies if DNS records change between the two resolutions.

With Istio DNS Proxy (dns.istio.io)

Istio’s sidecar includes a DNS proxy that intercepts DNS queries from the application. When enabled (via meshConfig.defaultConfig.proxyMetadata.ISTIO_META_DNS_CAPTURE and ISTIO_META_DNS_AUTO_ALLOCATE), the proxy can:

  • Auto-allocate virtual IPs for ServiceEntry hosts that do not have addresses defined, which is critical for TCP ServiceEntries that need IP-based matching.
  • Resolve ServiceEntry hosts directly, avoiding the round-trip to kube-dns for known mesh services.
  • Ensure consistency between the application’s DNS resolution and the sidecar’s endpoint resolution.

In modern Istio installations (1.18+), DNS capture is enabled by default. Verify with:

istioctl proxy-config bootstrap <pod-name> -n <namespace> | grep -A2 "ISTIO_META_DNS"

When DNS Proxy Matters Most

The DNS proxy is especially important for TCP ServiceEntries without an explicit addresses field. Without a VIP, Envoy cannot match an incoming TCP connection to the correct ServiceEntry because there is no HTTP Host header to inspect. The DNS proxy solves this by auto-allocating a VIP from the 240.240.0.0/16 range and returning that VIP when the application resolves the hostname. The sidecar then intercepts traffic to that VIP and routes it to the correct external endpoint.

Sticky Sessions with ServiceEntry

Some external services require session affinity — for example, a legacy service that stores session state in memory, or a WebSocket endpoint that must maintain a persistent connection to the same backend. Istio supports sticky sessions for external services through consistent hashing in a DestinationRule.

apiVersion: networking.istio.io/v1
kind: ServiceEntry
metadata:
  name: legacy-session-service
  namespace: default
spec:
  hosts:
    - legacy-session.internal
  location: MESH_INTERNAL
  ports:
    - number: 8080
      name: http
      protocol: HTTP
  resolution: STATIC
  endpoints:
    - address: 10.0.1.10
    - address: 10.0.1.11
    - address: 10.0.1.12
---
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: legacy-session-dr
  namespace: default
spec:
  host: legacy-session.internal
  trafficPolicy:
    loadBalancer:
      consistentHash:
        httpCookie:
          name: SERVERID
          ttl: 3600s

This configuration hashes on an HTTP cookie named SERVERID. If the cookie does not exist, Envoy generates one and sets it on the response so that subsequent requests from the same client stick to the same endpoint. You can also hash on:

  • HTTP header: consistentHash.httpHeaderName: "x-user-id" — useful when your application sends a user identifier in every request.
  • Source IP: consistentHash.useSourceIp: true — simplest option but breaks in environments with NAT or shared egress IPs.
  • Query parameter: consistentHash.httpQueryParameterName: "session_id" — for REST APIs that include a session identifier in the URL.

Sticky sessions with ServiceEntry work identically to in-mesh sticky sessions. The key requirement is that the ServiceEntry must use STATIC or DNS resolution (not NONE) so that Envoy has multiple endpoints to hash across. With DNS_ROUND_ROBIN, there is only one logical endpoint, so consistent hashing has no effect.

Troubleshooting Common Issues

503 Errors When Calling External Services

The most common ServiceEntry issue. Start with this diagnostic sequence:

# Check if the ServiceEntry is applied and visible to the proxy
istioctl proxy-config cluster <pod-name> -n <namespace> | grep <external-host>

# Check the listeners
istioctl proxy-config listener <pod-name> -n <namespace> --port <port>

# Look at Envoy access logs for the specific request
kubectl logs <pod-name> -n <namespace> -c istio-proxy | grep <external-host>

Common causes of 503 errors:

  • Wrong protocol: Setting protocol: HTTPS when your application initiates TLS. Use TLS for pass-through; use HTTP only if the sidecar does TLS origination.
  • Missing ServiceEntry in REGISTRY_ONLY mode: If outboundTrafficPolicy is REGISTRY_ONLY, any host without a ServiceEntry is blocked.
  • exportTo restriction: The ServiceEntry is in namespace A, exported only to ".", and the calling pod is in namespace B.
  • DNS resolution failure: Envoy cannot resolve the hostname. Check that the DNS servers are reachable from the pod.

DNS Resolution Failures

When Envoy’s async DNS resolver fails, you will see UH (upstream unhealthy) or UF (upstream connection failure) flags in access logs.

# Verify DNS works from inside the sidecar
kubectl exec <pod-name> -n <namespace> -c istio-proxy -- \
  pilot-agent request GET /dns_resolve?proxyID=<pod-name>.<namespace>&host=api.stripe.com

# Check Envoy cluster health
istioctl proxy-config endpoint <pod-name> -n <namespace> | grep <external-host>

If the endpoint shows UNHEALTHY, Envoy resolved the DNS but the outlier detection ejected the host. If no endpoint appears at all, DNS resolution is failing. Common fix: ensure your pods can reach an external DNS server, or that CoreDNS is configured to forward queries for the external domain.

TLS Origination Not Working

If you configured TLS origination via a DestinationRule but traffic still fails:

  • Ensure the ServiceEntry port protocol is HTTP, not TLS. If you set it to TLS, Envoy treats the connection as opaque TLS pass-through and will not apply the DestinationRule’s TLS settings.
  • Verify the DestinationRule’s host field exactly matches the ServiceEntry’s hosts entry.
  • Check that the VirtualService (if used) routes to the correct port number.

TCP ServiceEntry Not Intercepting Traffic

For TCP-protocol ServiceEntries without the DNS proxy, Envoy cannot match traffic by hostname. You must either:

  • Set an explicit addresses field with a VIP that your application targets.
  • Enable Istio’s DNS proxy to auto-allocate VIPs.
  • Ensure the destination IP matches what the ServiceEntry resolves to.

Without one of these, TCP traffic goes through the PassthroughCluster and bypasses your ServiceEntry entirely.

Frequently Asked Questions

Do I need a ServiceEntry if outboundTrafficPolicy is set to ALLOW_ANY?

You do not need one for connectivity — your services can reach external hosts without it. But you should create ServiceEntries anyway. Without them, outbound traffic goes through the PassthroughCluster, which means no detailed metrics per destination, no access logging with the external hostname, no circuit breaking, no retries, and no timeout policies. A ServiceEntry is the difference between “it works” and “it works reliably with observability.”

What is the difference between protocol TLS and HTTPS in a ServiceEntry port?

TLS tells Envoy to treat the connection as opaque TLS. Envoy reads the SNI header to determine routing but does not decrypt the payload. Use this when your application initiates TLS directly. HTTPS tells Envoy the protocol is HTTP over TLS, which implies Envoy should handle TLS. In practice, for external services where the application manages its own TLS, use TLS. Use HTTP with a DestinationRule TLS origination when you want the sidecar to handle TLS.

Can I use wildcards in ServiceEntry hosts?

Yes, but with limitations. You can use *.example.com to match any subdomain of example.com. However, wildcard entries only work with resolution: NONE because Envoy cannot perform DNS lookups for wildcard hostnames. This means you lose the ability to apply per-endpoint traffic policies. Wildcard ServiceEntries are best used for broad egress access control rather than fine-grained traffic management.

How do I configure sticky sessions for an external service behind a ServiceEntry?

Create a ServiceEntry with STATIC or DNS resolution (so Envoy has multiple endpoints), then pair it with a DestinationRule that configures consistentHash under trafficPolicy.loadBalancer. You can hash on an HTTP cookie, header, source IP, or query parameter. The ServiceEntry must expose multiple endpoints for consistent hashing to have any effect. See the “Sticky Sessions with ServiceEntry” section above for a complete YAML example.

How does ServiceEntry interact with NetworkPolicy and Istio AuthorizationPolicy?

A ServiceEntry does not bypass Kubernetes NetworkPolicy. If a NetworkPolicy blocks egress to the external IP, traffic will be dropped at the CNI level before Envoy can route it. Istio AuthorizationPolicy can also restrict which workloads are allowed to call specific ServiceEntry hosts. For defense in depth, use ServiceEntry for traffic management and observability, AuthorizationPolicy for workload-level access control, and NetworkPolicy for network-level enforcement.

Wrapping Up

ServiceEntry is one of the most practical Istio resources you will use in production. It transforms opaque outbound connections into managed, observable, policy-controlled traffic — and it does so without requiring changes to your application code. Start with the basics: create a ServiceEntry for each external dependency, set the correct resolution type, and pair it with a DestinationRule for connection limits and circuit breaking. As you mature, add VirtualServices for retries and timeouts, configure sticky sessions where needed, and enable the DNS proxy for seamless TCP service integration.

The pattern is always the same: register the service, apply policies, observe the traffic. Every external dependency you formalize with a ServiceEntry is one fewer blind spot in your production mesh.

Prometheus Alertmanager vs Grafana Alerting (2026): Architecture, Features, and When to Use Each

Prometheus Alertmanager vs Grafana Alerting (2026): Architecture, Features, and When to Use Each

Most observability stacks that have been running in production for more than a year end up with alerting spread across two systems: Prometheus Alertmanager handling metric-based alerts and Grafana Alerting managing everything else. Engineers add a Slack integration in Grafana because it is convenient, then realize their Alertmanager routing tree already covers the same service. Before long, the on-call team receives duplicated pages, silencing rules live in two places, and nobody is confident which system is authoritative.

This is the alerting consolidation problem, and it affects teams of every size. The question is straightforward: should you standardize on Prometheus Alertmanager, move everything into Grafana Alerting, or deliberately run both? The answer depends on your datasource mix, your GitOps maturity, and how your organization manages on-call routing. This guide breaks down the architecture, features, and operational trade-offs of each system so you can make a deliberate choice instead of drifting into accidental complexity.

Architecture Overview

Before comparing features, you need to understand how each system fits into the alerting pipeline. They occupy the same logical space — “receive a condition, route a notification” — but they get there from fundamentally different starting points.

Prometheus Alertmanager: The Standalone Receiver

Alertmanager is a dedicated, standalone component in the Prometheus ecosystem. It does not evaluate alert rules itself. Instead, Prometheus (or any compatible sender like Thanos Ruler, Cortex, or Mimir Ruler) evaluates PromQL expressions and pushes firing alerts to the Alertmanager API. Alertmanager then handles deduplication, grouping, inhibition, silencing, and notification delivery.

# Simplified Prometheus → Alertmanager flow
#
# [Prometheus] --evaluates rules--> [firing alerts]
#        |
#        +--POST /api/v2/alerts--> [Alertmanager]
#                                      |
#                          +-----------+-----------+
#                          |           |           |
#                       [Slack]    [PagerDuty]  [Email]

The entire configuration lives in a single YAML file (alertmanager.yml). This includes the routing tree, receiver definitions, inhibition rules, and silence templates. There is no database, no UI-driven state — just a config file and an optional local storage directory for notification state and silences. This makes it trivially reproducible and ideal for GitOps workflows.

For high availability, you run multiple Alertmanager instances in a gossip-based cluster. They use a mesh protocol to share silence and notification state, ensuring that failover does not result in duplicate or lost notifications. The HA model is well-understood and has been stable for years.

Grafana Alerting: The Integrated Platform

Grafana Alerting (sometimes called “Grafana Unified Alerting,” introduced in Grafana 8 and significantly matured through Grafana 11 and 12) takes a different architectural approach. It embeds the entire alerting lifecycle — rule evaluation, state management, routing, and notification — inside the Grafana server process. Under the hood, it actually uses a fork of Alertmanager for the routing and notification layer, but this is an implementation detail that is invisible to users.

# Simplified Grafana Alerting flow
#
# [Grafana Server]
#   ├── Rule Evaluation Engine
#   │     ├── queries Prometheus
#   │     ├── queries Loki
#   │     ├── queries CloudWatch
#   │     └── queries any supported datasource
#   │
#   ├── Alert State Manager (internal)
#   │
#   └── Embedded Alertmanager (routing + notifications)
#           |
#           +-----------+-----------+
#           |           |           |
#        [Slack]    [PagerDuty]  [Email]

The critical distinction is that Grafana Alerting evaluates alert rules itself, querying any configured datasource — not just Prometheus. It can fire alerts based on Loki log queries, Elasticsearch searches, CloudWatch metrics, PostgreSQL queries, or any of the 100+ datasource plugins available in Grafana. Rule definitions, contact points, notification policies, and mute timings are stored in the Grafana database (or provisioned via YAML files and the Grafana API).

For high availability in self-hosted environments, Grafana Alerting relies on a shared database and a peer-discovery mechanism between Grafana instances. In Grafana Cloud, HA is fully managed by Grafana Labs.

Feature Comparison

The following table provides a side-by-side comparison of the capabilities that matter most in production alerting systems. Both systems are mature, but they prioritize different things.

FeaturePrometheus AlertmanagerGrafana Alerting
DatasourcesPrometheus-compatible only (Prometheus, Thanos, Mimir, VictoriaMetrics)Any Grafana datasource (Prometheus, Loki, Elasticsearch, CloudWatch, SQL databases, etc.)
Rule evaluationExternal (Prometheus/Ruler evaluates rules and pushes alerts)Built-in (Grafana evaluates rules directly)
Routing treeHierarchical YAML-based routing with match/match_re, continue, group_byNotification policies with label matchers, nested policies, mute timings
GroupingFull support via group_by, group_wait, group_intervalFull support via notification policies with equivalent controls
InhibitionNative inhibition rules (suppress alerts when a related alert is firing)Supported since Grafana 10.3 but less flexible than Alertmanager
SilencingLabel-based silences via API or UI, time-limitedMute timings (recurring schedules) and silences (ad-hoc, label-based)
Notification channelsEmail, Slack, PagerDuty, OpsGenie, VictoriaOps, webhook, WeChat, Telegram, SNS, WebexAll of the above plus Teams, Discord, Google Chat, LINE, Threema, Oncall, and more via contact points
TemplatingGo templates in notification configGo templates with access to Grafana template variables and functions
Multi-tenancyNot built-in; achieved via separate instances or Mimir AlertmanagerNative multi-tenancy via Grafana organizations and RBAC
High availabilityGossip-based cluster (peer mesh, well-proven)Database-backed HA with peer discovery between Grafana instances
Configuration modelSingle YAML file, fully declarativeUI + API + provisioning YAML files, stored in database
GitOps compatibilityExcellent — config file lives in version control nativelyPossible via provisioning files or Terraform provider, but requires extra tooling
External alert sourcesAny system that can POST to the Alertmanager APISupported via the Grafana Alerting API (external alerts can be pushed)
Managed serviceAvailable via Grafana Cloud (as Mimir Alertmanager), Amazon Managed PrometheusAvailable via Grafana Cloud

Alertmanager Strengths

Alertmanager has been a production staple since 2015. Over a decade of use across thousands of organizations has made it one of the most battle-tested components in the CNCF ecosystem. Here is where it genuinely excels.

Declarative, GitOps-Native Configuration

The entire Alertmanager configuration is a single YAML file. There is no hidden state in a database, no click-driven configuration that someone forgets to document. You check it into Git, review it in a pull request, and deploy it through your CI/CD pipeline like any other infrastructure code. This is a significant operational advantage for teams that have invested in GitOps.

# alertmanager.yml — everything in one file
global:
  resolve_timeout: 5m
  slack_api_url: "https://hooks.slack.com/services/T00/B00/XXX"

route:
  receiver: platform-team
  group_by: [alertname, cluster, namespace]
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  routes:
    - match:
        severity: critical
      receiver: pagerduty-oncall
      group_wait: 10s
    - match_re:
        team: "^(payments|checkout)$"
      receiver: payments-slack
      continue: true

receivers:
  - name: platform-team
    slack_configs:
      - channel: "#platform-alerts"
  - name: pagerduty-oncall
    pagerduty_configs:
      - service_key: ""
  - name: payments-slack
    slack_configs:
      - channel: "#payments-oncall"

inhibit_rules:
  - source_match:
      severity: critical
    target_match:
      severity: warning
    equal: [alertname, cluster]

Every change is auditable. Rollbacks are a git revert away. This matters enormously when you are debugging why an alert did not fire at 3 AM.

Lightweight and Single-Purpose

Alertmanager does one thing: route and deliver notifications. It has no dashboard, no query engine, no datasource plugins. This single-purpose design makes it operationally simple. Resource consumption is minimal — a small Alertmanager instance handles thousands of active alerts on a few hundred megabytes of memory. It starts in milliseconds and requires almost no maintenance.

Mature Inhibition and Routing

Alertmanager’s inhibition rules are first-class citizens. You can suppress downstream warnings when a critical alert is already firing, preventing alert storms from overwhelming your on-call team. The hierarchical routing tree with continue flags allows for nuanced delivery: send to the team channel AND escalate to PagerDuty simultaneously, with different grouping strategies at each level.

Proven High Availability

The gossip-based HA cluster has been stable for years. Running three Alertmanager replicas behind a load balancer (or using Kubernetes service discovery) gives you reliable notification delivery without shared storage. The protocol handles deduplication across instances automatically, which is the hardest part of distributed alerting.

Grafana Alerting Strengths

Grafana Alerting has matured considerably since its rocky introduction in Grafana 8. By Grafana 11 and 12, it has become a legitimate production alerting platform with capabilities that Alertmanager cannot match on its own.

Multi-Datasource Alert Rules

This is Grafana Alerting’s strongest differentiator. You can write alert rules that query Loki for error log spikes, CloudWatch for AWS resource utilization, Elasticsearch for application errors, or a PostgreSQL database for business metrics — all from the same alerting system. If your observability stack includes more than just Prometheus, this eliminates the need for separate alerting tools per datasource.

# Grafana alert rule provisioning example — alerting on Loki log errors
apiVersion: 1
groups:
  - orgId: 1
    name: application-errors
    folder: Production
    interval: 1m
    rules:
      - uid: loki-error-spike
        title: "High error rate in payment service"
        condition: C
        data:
          - refId: A
            datasourceUid: loki-prod
            model:
              expr: 'sum(rate({app="payment-service"} |= "ERROR" [5m]))'
          - refId: B
            datasourceUid: "__expr__"
            model:
              type: reduce
              expression: A
              reducer: last
          - refId: C
            datasourceUid: "__expr__"
            model:
              type: threshold
              expression: B
              conditions:
                - evaluator:
                    type: gt
                    params: [10]
        for: 5m
        labels:
          severity: warning
          team: payments

This is something Alertmanager simply cannot do. Alertmanager only receives pre-evaluated alerts — it has no concept of datasources or query execution.

Unified UI for Alert Management

Grafana provides a single pane of glass for alert rule creation, visualization, notification policy management, contact point configuration, and silence management. For teams where not every engineer is comfortable editing YAML routing trees, the visual notification policy editor significantly reduces the barrier to entry. You can see the state of every alert rule, its evaluation history, and the exact notification path it will take — all without leaving the browser.

Native Multi-Tenancy and RBAC

Grafana’s organization model and role-based access control extend naturally to alerting. Different teams can manage their own alert rules, contact points, and notification policies within their organization or folder scope, without seeing or interfering with other teams. Achieving this with standalone Alertmanager requires either running separate instances per tenant or using Mimir’s multi-tenant Alertmanager.

Mute Timings and Richer Scheduling

While Alertmanager supports silences (ad-hoc, time-limited suppressions), Grafana Alerting adds mute timings — recurring time-based windows where notifications are suppressed. This is useful for scheduled maintenance windows, business-hours-only alerting, or suppressing non-critical alerts on weekends. Alertmanager requires external tooling or manual silence creation for recurring windows.

Grafana Cloud as a Managed Option

For teams that want to avoid managing alerting infrastructure entirely, Grafana Cloud provides a fully managed Grafana Alerting stack. This includes HA, state persistence, and notification delivery without any self-hosted components. The Grafana Cloud alerting stack also includes a managed Mimir Alertmanager, which means you can use Prometheus-native alerting rules if you prefer that model while still benefiting from the managed infrastructure.

When to Use Prometheus Alertmanager

Alertmanager is the right choice when the following conditions describe your environment:

  • Your metrics stack is Prometheus-native. If all your alert rules are PromQL expressions evaluated by Prometheus, Thanos Ruler, or Mimir Ruler, Alertmanager is the natural fit. There is no added value in routing those alerts through Grafana.
  • GitOps is non-negotiable. If every infrastructure change must go through a pull request and be fully declarative, Alertmanager’s single-file configuration model is significantly easier to manage than Grafana’s database-backed state. Tools like amtool provide config validation in CI pipelines.
  • You need fine-grained routing with inhibition. Complex routing trees with multiple levels of grouping, inhibition rules, and continue flags are more naturally expressed in Alertmanager’s YAML format. The routing logic has been stable and well-documented for years.
  • You run microservices with per-team routing. If each team owns its routing subtree and the routing logic is complex, Alertmanager’s hierarchical model scales better than UI-driven configuration. Teams can own their section of the config file via CODEOWNERS in Git.
  • You want minimal operational overhead. Alertmanager is a single binary with minimal resource requirements. There is no database to back up, no migrations to run, and no UI framework to keep updated.

When to Use Grafana Alerting

Grafana Alerting is the right choice when these conditions apply:

  • You alert on more than just Prometheus metrics. If you need alert rules based on Loki logs, Elasticsearch queries, CloudWatch metrics, or database queries, Grafana Alerting is the only option that handles all of these natively. The alternative is running separate alerting tools per datasource, which is worse.
  • Your team prefers UI-driven configuration. Not every engineer wants to edit YAML routing trees. If your organization values a visual interface for managing alerts, contact points, and notification policies, Grafana’s UI is a major productivity advantage.
  • You are using Grafana Cloud. If you are already on Grafana Cloud, using its built-in alerting is the path of least resistance. You get HA, managed notification delivery, and a unified experience without running any additional infrastructure.
  • Multi-tenancy is a requirement. If multiple teams need isolated alerting configurations with RBAC, Grafana’s native organization and folder-based access model is significantly easier to set up than running per-tenant Alertmanager instances.
  • You want mute timings for recurring maintenance windows. If your team regularly needs to suppress alerts during scheduled windows (deploy windows, batch processing hours, weekend non-critical suppression), Grafana’s mute timings feature is more ergonomic than creating and managing recurring silences in Alertmanager.

Running Both Together: The Hybrid Pattern

In practice, many production environments run both Alertmanager and Grafana Alerting. This is not necessarily a mistake — it can be a deliberate architectural choice when done with clear boundaries.

Common Hybrid Architecture

The most common pattern looks like this:

  • Prometheus Alertmanager handles all metric-based alerts. PromQL rules are evaluated by Prometheus or a long-term storage ruler (Thanos, Mimir). Alertmanager owns routing, grouping, and notification for these alerts.
  • Grafana Alerting handles non-Prometheus alerts: log-based alerts from Loki, business metrics from SQL datasources, and cross-datasource correlation rules.

The key to making this work without chaos is establishing clear ownership rules:

# Ownership boundaries for hybrid alerting
#
# Prometheus Alertmanager owns:
#   - All PromQL-based alert rules
#   - Infrastructure alerts (node, kubelet, etcd, CoreDNS)
#   - Application SLO/SLI alerts based on metrics
#
# Grafana Alerting owns:
#   - Log-based alert rules (Loki, Elasticsearch)
#   - Business metric alerts (SQL datasources)
#   - Cross-datasource correlation rules
#   - Alerts for teams that prefer UI-driven management
#
# Shared:
#   - Contact points / receivers use the same Slack channels and PagerDuty services
#   - On-call rotations are managed externally (PagerDuty, Grafana OnCall)

Both systems can deliver to the same notification channels. The critical discipline is ensuring that silencing and maintenance windows are applied in both systems when needed. This is the primary operational cost of the hybrid approach.

Grafana as a Viewer for Alertmanager

Even if you use Alertmanager exclusively for routing and notification, Grafana can serve as a read-only viewer. Grafana natively supports connecting to an external Alertmanager datasource, allowing you to see firing alerts, active silences, and alert groups in the Grafana UI. This gives you the operational visibility of Grafana without moving your alerting logic into it.

# Grafana datasource provisioning for external Alertmanager
apiVersion: 1
datasources:
  - name: Alertmanager
    type: alertmanager
    url: http://alertmanager.monitoring.svc:9093
    access: proxy
    jsonData:
      implementation: prometheus

Migration Considerations

If you are moving from one system to the other, here are the practical considerations to plan for.

Migrating from Alertmanager to Grafana Alerting

  • Rule conversion. Your PromQL-based recording and alerting rules defined in Prometheus rule files need to be recreated as Grafana alert rules. Grafana provides a migration tool that can import Prometheus-format rules, but complex expressions may need manual adjustment.
  • Routing tree translation. Alertmanager’s hierarchical routing tree maps to Grafana’s notification policies, but the semantics are not identical. Test the notification routing thoroughly — the continue flag behavior and default routes may differ.
  • Silence and inhibition migration. Active silences are ephemeral and do not need migration. Inhibition rules need to be recreated in Grafana’s format. Recurring maintenance windows should be converted to mute timings.
  • Run in parallel first. The safest migration strategy is to run both systems in parallel for two to four weeks, sending notifications from both, then cutting over when you have confidence in the Grafana setup. Accept the temporary noise of duplicate alerts — it is far cheaper than missing a critical page during migration.

Migrating from Grafana Alerting to Alertmanager

  • Datasource limitation. You can only migrate alerts that are based on Prometheus-compatible datasources. Alerts querying Loki, Elasticsearch, or SQL datasources have no equivalent in Alertmanager — you will need an alternative solution for those.
  • Rule export. Export Grafana alert rules and convert them to Prometheus-format rule files. The Grafana API (GET /api/v1/provisioning/alert-rules) provides structured output that can be transformed with a script.
  • Contact point mapping. Map Grafana contact points to Alertmanager receivers. The configuration format is different, but the concepts are equivalent.
  • State loss. Alertmanager does not carry over Grafana’s alert evaluation history. You start fresh. Plan for a brief period where alerts may re-fire as Prometheus evaluates rules that were previously managed by Grafana.

Decision Framework

If you want a quick decision path, use this framework:

Start here:
│
├── Do you alert on non-Prometheus datasources (Loki, ES, SQL, CloudWatch)?
│   ├── YES → Grafana Alerting (at least for those datasources)
│   └── NO ↓
│
├── Is GitOps/declarative config a hard requirement?
│   ├── YES → Alertmanager
│   └── NO ↓
│
├── Do you need multi-tenancy with RBAC?
│   ├── YES → Grafana Alerting (or Mimir Alertmanager)
│   └── NO ↓
│
├── Are you on Grafana Cloud?
│   ├── YES → Grafana Alerting (path of least resistance)
│   └── NO ↓
│
└── Default → Alertmanager (simpler, lighter, well-proven)

For many teams, the honest answer is “both” — Alertmanager for the Prometheus-native metric pipeline, Grafana Alerting for everything else. That is a valid architecture as long as the ownership boundaries are documented and the on-call team knows where to look.

Frequently Asked Questions

What is the difference between Alertmanager and Grafana Alerting?

Prometheus Alertmanager is a standalone notification routing engine that receives pre-evaluated alerts from Prometheus and delivers them to receivers like Slack, PagerDuty, or email. It does not evaluate alert rules itself. Grafana Alerting is an integrated alerting platform embedded in Grafana that both evaluates alert rules (querying any supported datasource) and handles notification routing. Alertmanager is configured entirely via YAML, while Grafana Alerting offers a UI, API, and file-based provisioning. The fundamental difference is scope: Alertmanager handles only the routing and notification phase, while Grafana Alerting handles the full lifecycle from query evaluation to notification.

Can Grafana Alerting replace Prometheus Alertmanager?

Yes, for many use cases. Grafana Alerting can evaluate PromQL rules directly against your Prometheus datasource, so you do not strictly need a separate Alertmanager instance. However, there are scenarios where Alertmanager remains the better choice: heavily GitOps-driven environments, teams that need Alertmanager’s mature inhibition rules, or architectures where Prometheus rule evaluation happens externally (Thanos Ruler, Mimir Ruler) and a dedicated Alertmanager is already in the pipeline. If your only datasource is Prometheus and you value declarative configuration, Alertmanager is still simpler and lighter.

Is Grafana Alertmanager the same as Prometheus Alertmanager?

Not exactly. Grafana Alerting uses a fork of the Prometheus Alertmanager code internally for its notification routing engine, but it is not the same product. The Grafana “Alertmanager” you see in the UI is a managed, embedded component with a different configuration interface (notification policies, contact points, mute timings) compared to the standalone Prometheus Alertmanager (routing tree, receivers, inhibition rules in YAML). Grafana can also connect to an external Prometheus Alertmanager as a datasource, which adds to the confusion. When people refer to “Grafana Alertmanager,” they usually mean the embedded routing engine inside Grafana Alerting.

What are the best alternatives to Prometheus Alertmanager?

The most direct alternative is Grafana Alerting, which can receive and route Prometheus alerts while also supporting other datasources. Beyond that, other options include: Grafana OnCall for on-call management and escalation (often used alongside Alertmanager rather than replacing it), PagerDuty or Opsgenie as managed incident response platforms that can receive alerts directly, Keep as an open-source AIOps alert management platform, and Mimir Alertmanager for multi-tenant environments running Grafana Mimir. The choice depends on whether you need an Alertmanager replacement (routing and notification) or a complementary tool for escalation and incident response.

Should I use Prometheus alerts or Grafana alerts for Kubernetes monitoring?

For Kubernetes monitoring specifically, the kube-prometheus-stack (which includes Prometheus, Alertmanager, and a comprehensive set of pre-built alerting rules) remains the industry standard. These rules are PromQL-based and are designed to work with Alertmanager. If you are deploying kube-prometheus-stack, using Alertmanager for metric-based alerts is the straightforward choice. Add Grafana Alerting on top if you also need to alert on logs (via Loki) or non-metric datasources. For Kubernetes-specific monitoring, the combination of Prometheus rules with Alertmanager for routing is the most mature and well-supported path.

Final Thoughts

The Alertmanager vs Grafana Alerting debate is not really about which tool is better — it is about which tool fits your operational context. Alertmanager is simpler, lighter, and more GitOps-friendly. Grafana Alerting is more versatile, more accessible to UI-oriented teams, and the only option if you need multi-datasource alerting. Running both is perfectly valid when the boundaries are clear.

The worst outcome is not picking the “wrong” tool. The worst outcome is running both accidentally, with overlapping coverage, duplicated notifications, and no clear ownership. Whatever you choose, document the decision, define the ownership boundaries, and make sure your on-call team knows exactly where to go when they need to silence an alert at 3 AM.

Gateway API Provider Support in 2026: A Critical Evaluation

Gateway API Provider Support in 2026: A Critical Evaluation

The Kubernetes Gateway API is no longer a future concept—it’s the present standard for traffic management. With the deprecation of Ingress NGINX’s stable APIs signaling a definitive shift, platform teams and architects are now faced with a critical decision: which Gateway API provider to adopt. The official implementations page lists numerous options, but the real-world picture is one of fragmented support, varying stability, and significant gaps that can derail multi-cluster strategies.

In this evaluation, we move beyond marketing checklists to analyze the practical state of Gateway API support across major cloud providers, ingress controllers, and service meshes. We’ll examine which versions are truly production-ready, where the interoperability pitfalls lie, and what you must account for before standardizing across your infrastructure.

The Gateway API Maturity Spectrum: From Experimental to Standard

Not all Gateway API resources are created equal. The API’s unique versioning model—with features progressing through Experimental, Standard, and Extended support tracks—means provider support is inherently uneven. An implementation might fully support the stable Gateway and HTTPRoute resources while offering only partial or experimental backing for GRPCRoute or TCPRoute.

This creates a fundamental challenge for architects: designing for the lowest common denominator or accepting provider-specific constraints. The decision hinges on accurately mapping your traffic management requirements (HTTP, TLS termination, gRPC, TCP/UDP load balancing) against what each provider actually delivers in a stable form.

Core API Support: The Foundation

Most providers now support the v1 (GA) versions of the foundational resources:

  • GatewayClass & Gateway: Nearly universal support for v1. These are the control plane resources for provisioning and configuring load balancers.
  • HTTPRoute: Universal support for v1. This is the workhorse for HTTP/HTTPS traffic routing and is considered the most stable.

However, support for other route types reveals the fragmentation:

  • GRPCRoute: Often in beta or experimental stages. Critical for modern microservices architectures but not yet universally reliable.
  • TCPRoute & UDPRoute: Patchy support. Some providers implement them as beta, others ignore them entirely, forcing fallbacks to provider-specific annotations or custom resources.
  • TLSRoute: Frequently tied to specific certificate management integrations (e.g., cert-manager).

Major Provider Deep Dive: Implementation Realities

AWS Elastic Kubernetes Service (EKS)

AWS offers an official Gateway API controller for EKS. Its support is pragmatic but currently limited:

  • Supported Resources: GatewayClass, Gateway, HTTPRoute, and GRPCRoute (all v1beta1 as of early 2024). Note the use of v1beta1 for GRPCRoute, indicating it’s not yet at GA stability.
  • Underlying Infrastructure: Maps directly to AWS Application Load Balancer (ALB) and Network Load Balancer (NLB). This is a strength (managed AWS services) and a constraint (you inherit ALB/NLB feature limits).
  • Critical Gap: No support for TCPRoute or UDPRoute. If your workload requires raw TCP/UDP load balancing, you must use the legacy Kubernetes Service type LoadBalancer or a different ingress controller alongside the Gateway API controller, creating a disjointed management model.

Google Kubernetes Engine (GKE) & Azure Kubernetes Service (AKS)

Both Google and Azure have integrated Gateway API support directly into their managed Kubernetes offerings, often with a focus on their global load-balancing infrastructures.

  • GKE: Offers the GKE Gateway controller. It supports v1 resources and can provision Google Cloud Global External Load Balancers. Its integration with Google’s certificate management and CDN is a key advantage. However, advanced routing features may require GCP-specific backend configs.
  • AKS: Provides the Application Gateway Ingress Controller (AGIC) with Gateway API support, mapping to Azure Application Gateway. Support for newer route types like GRPCRoute has historically lagged behind other providers.

The pattern here is clear: cloud providers implement the Gateway API as a facade over their existing, proprietary load-balancing products. This ensures stability and performance but can limit portability and advanced cross-provider features.

NGINX & Kong Ingress Controller

These third-party, cluster-based controllers offer a different value proposition: consistency across any Kubernetes distribution, including on-premises.

  • NGINX: With its stable Ingress APIs deprecated in favor of Gateway API, its Gateway API implementation is now the primary path forward. It generally has excellent support for the full range of experimental and standard resources, as it’s not constrained by a cloud vendor’s underlying service. This makes it a strong choice for hybrid or multi-cloud deployments where feature parity is crucial.
  • Kong Ingress Controller: Kong has been an early and comprehensive supporter of the Gateway API, often implementing features quickly. It leverages Kong Gateway’s extensive plugin ecosystem, which can be a major draw but also introduces vendor lock-in.

Critical Gaps for Enterprise Architects

Beyond checking resource support boxes, several deeper gaps can impact production deployments, especially in complex environments.

1. Multi-Cluster & Hybrid Environment Support

The Gateway API specification includes concepts like ReferenceGrant for cross-namespace and future cross-cluster routing. In practice, very few providers have robust, production-ready multi-cluster stories. Most implementations assume a single cluster. If your architecture spans multiple clusters (for isolation, geography, or failure domains), you will likely need to:

  • Manage separate Gateway resources per cluster.
  • Use an external global load balancer (like a cloud DNS/GSLB) to distribute traffic across cluster-specific gateways.
  • This negates some of the API’s promise of a unified, abstracted configuration.

2. Policy Attachment and Extension Consistency

Gateway API is designed to be extended through policy attachment (e.g., for rate limiting, WAF rules, authentication). There is no standard for how these policies are implemented. One provider might use a custom RateLimitPolicy CRD, while another might rely on annotations or a separate policy engine. This creates massive configuration drift and vendor lock-in, breaking the portability goal.

3. Observability and Debugging Interfaces

While the API defines status fields, the richness of operational data—detailed error logs, granular metrics tied to API resources, distributed tracing integration—varies wildly. Some providers expose deep integration with their monitoring stack; others offer minimal visibility. You must verify that the provider’s observability model meets your SRE team’s needs.

Evaluation Framework: Questions for Your Team

Before selecting a provider, work through this technical checklist:

  1. Route Requirements: Do we need stable support for HTTP only, or also gRPC, TCP, UDP? Is beta support acceptable for non-HTTP routes?
  2. Infrastructure Model: Do we want a cloud-managed load balancer (simpler, less control) or a cluster-based controller (more portable, more operational overhead)?
  3. Multi-Cluster Future: Is our architecture single-cluster today but likely to expand? Does the provider have a credible roadmap for multi-cluster Gateway API?
  4. Policy Needs: What advanced policies (auth, WAF, rate limiting) are required? How does the provider implement them? Can we live with vendor-specific policy CRDs?
  5. Observe & Debug: What logging, metrics, and tracing are exposed for Gateway API resources? Do they integrate with our existing observability platform?
  6. Upgrade Path: What is the provider’s track record for supporting new Gateway API releases? How painful are version upgrades?

Strategic Recommendations

Based on the current landscape, here are pragmatic paths forward:

  • For Single-Cloud Deployments: Start with your cloud provider’s native controller (AWS, GKE, AKS). It’s the path of least resistance and best integration with other cloud services (IAM, certificates, monitoring). Just be acutely aware of its specific limitations regarding unsupported route types.
  • For Hybrid/Multi-Cloud or On-Premises: Standardize on a portable, cluster-based controller like Ingress-NGINX or Kong. The consistency across environments will save significant operational complexity, even if it means forgoing some cloud-native integrations.
  • For Greenfield Projects: Design your applications and configurations against the stable v1 resources (Gateway, HTTPRoute) only. Treat any use of beta/experimental resources as a known risk that may require refactoring later.
  • Always Have an Exit Plan: Isolate Gateway API configuration YAMLs from provider-specific policies and annotations. This modularity will make migration less painful when the next generation of providers emerges or when you need to switch.

The Gateway API’s evolution is a net positive for the Kubernetes ecosystem, offering a far more expressive model than the original Ingress. However, in 2026, the provider landscape is still maturing. Support is broad but not deep, and critical gaps in multi-cluster management and policy portability remain. The successful architect will choose a provider not based on a feature checklist, but based on how well its specific constraints and capabilities align with their organization’s immediate traffic patterns and long-term platform strategy. The era of a universal, write-once-run-anywhere Gateway API configuration is not yet here—but with careful, informed provider selection, you can build a robust foundation for it.

Building a Kubernetes Migration Framework: Lessons from Ingress-NGINX

Building a Kubernetes Migration Framework: Lessons from Ingress-NGINX

The recent announcement regarding the deprecation of the Ingress-NGINX controller sent a ripple through the Kubernetes community. For many organizations, it’s the first major deprecation of a foundational, widely-adopted ecosystem component. While the immediate reaction is often tactical—”What do we replace it with?”—the more valuable long-term question is strategic: “How do we systematically manage this and future migrations?”

This event isn’t an anomaly; it’s a precedent. As Kubernetes matures, core add-ons, APIs, and patterns will evolve or sunset. Platform engineering teams need a repeatable, low-risk framework for navigating these changes. Drawing from the Ingress-NGINX transition and established deployment management principles, we can abstract a robust Kubernetes Migration Framework applicable to any major component, from service meshes to CSI drivers.

Why Ad-Hoc Migrations Fail in Production

Attempting a “big bang” replacement or a series of manual, one-off changes is a recipe for extended downtime, configuration drift, and undetected regression. Production Kubernetes environments are complex systems with deep dependencies:

  • Interdependent Workloads: Multiple applications often share the same ingress controller, relying on specific annotations, custom snippets, or behavioral quirks.
  • Automation and GitOps Dependencies: Helm charts, Kustomize overlays, and ArgoCD/Flux manifests are tightly coupled to the existing component’s API and schema.
  • Observability and Security Integration: Monitoring dashboards, logging parsers, and security policies are tuned for the current implementation.
  • Knowledge Silos: Tribal knowledge about workarounds and specific configurations isn’t documented.

A structured framework mitigates these risks by enforcing discipline, creating clear validation gates, and ensuring the capability to roll back at any point.

The Four-Phase Kubernetes Migration Framework

This framework decomposes the migration into four distinct phases: Assessment, Parallel Run, Cutover, and Decommission. Each phase has defined inputs, activities, and exit criteria.

Phase 1: Deep Assessment & Dependency Mapping

Before writing a single line of new configuration, understand the full scope. The goal is to move from “we use Ingress-NGINX” to a precise inventory of how it’s used.

  • Inventory All Ingress Resources: Use kubectl get ingress --all-namespaces as a starting point, but go deeper.
  • Analyze Annotation Usage: Script an analysis to catalog every annotation in use (e.g., nginx.ingress.kubernetes.io/rewrite-target, nginx.ingress.kubernetes.io/configuration-snippet). This reveals functional dependencies.
  • Map to Backend Services: For each Ingress, identify the backend Services and Namespaces. This highlights critical applications and potential blast radius.
  • Review Customizations: Document any custom ConfigMaps for main NGINX configuration, custom template patches, or modifications to the controller deployment itself.
  • Evaluate Alternatives: Based on the inventory, evaluate candidate replacements (e.g., Gateway API with a compatible implementation, another Ingress controller like Emissary-ingress or Traefik). The Google Cloud migration framework provides a useful decision tree for ingress-specific migrations.

The output of this phase is a migration manifesto: a concrete list of what needs to be converted, grouped by complexity and criticality.

Phase 2: Phased Rollout & Parallel Run

This is the core of a low-risk migration. Instead of replacing, you run the new and old systems in parallel, shifting traffic gradually. For ingress, this often means installing the new controller alongside the old one.

  • Dual Installation: Deploy the new ingress controller in the same cluster, configured with a distinct ingress class (e.g., ingressClassName: gateway vs. nginx).
  • Create Canary Ingress Resources: For a low-risk application, create a parallel Ingress or Gateway resource pointing to the new controller. Use techniques like managed deployments with canary patterns to control exposure.
    # Example: A new Gateway API HTTPRoute for a canary service
    apiVersion: gateway.networking.k8s.io/v1
    kind: HTTPRoute
    metadata:
    name: app-canary
    spec:
    parentRefs:
    - name: company-gateway
    rules:
    - backendRefs:
    - name: app-service
    port: 8080
    weight: 10 # Start with 10% of traffic

  • Validate Equivalency: Use traffic mirroring (if supported) or direct synthetic testing against both ingress paths. Compare logs, response headers, latency, and error rates.
  • Iterate and Expand: Gradually increase traffic weight or add more applications to the new stack, group by group, based on the assessment from Phase 1.

This phase relies heavily on your observability stack. Dashboards comparing error rates, latency (p50, p99), and throughput between the old and new paths are essential.

Phase 3: Validation & Automated Cutover

The cutover is not a manual event. It’s the final step in a validation process.

  • Define Validation Tests: Create a suite of tests that must pass before full cutover. This includes:
    • Smoke tests for all critical user journeys.
    • Load tests to verify performance under expected traffic patterns.
    • Security scan validation (e.g., no unintended ports open).
    • Compliance checks (e.g., specific headers are present).
  • Automate the Switch: For each application, the cutover is ultimately a change in its Ingress or Gateway resource. This should be done via your GitOps pipeline. Update the source manifests (e.g., change the ingressClassName), merge, and let automation apply it. This ensures the state is declarative and recorded.
  • Maintain Rollback Capacity: The old system must remain operational and routable (with reduced capacity) during this phase. The GitOps rollback is simply reverting the manifest change.

Phase 4: Observability & Decommission

Once all traffic is successfully migrated and validated over a sustained period (e.g., 72 hours), you can decommission the old component.

  • Monitor Aggressively: Keep a close watch on all key metrics for at least one full business cycle (a week).
  • Remove Old Resources: Delete the old controller’s Deployment, Service, ConfigMaps, and CRDs (if no longer needed).
  • Clean Up Auxiliary Artifacts: Remove old RBAC bindings, service accounts, and any custom monitoring alerts or dashboards specific to the old component.
  • Document Lessons Learned: Update runbooks and architecture diagrams. Note any surprises, gaps in the process, or validation tests that were particularly valuable.

Key Principles for a Resilient Framework

Beyond the phases, these principles should guide your framework’s design:

  • Always Maintain Rollback Capability: Every step should be reversible with minimal disruption. This is a core tenet of managing Kubernetes deployments.
  • Leverage GitOps for State Management: All desired state changes (Ingress resources, controller deployments) must flow through version-controlled manifests. This provides an audit trail, consistency, and the simplest rollback mechanism (git revert).
  • Validate with Production Traffic Patterns: Synthetic tests are insufficient. Use canary weights and traffic mirroring to validate with real user traffic in a controlled manner.
  • Communicate Transparently: Platform teams should maintain a clear migration status page for internal stakeholders, showing which applications have been migrated, which are in progress, and the overall timeline.

Conclusion: Building a Migration-Capable Platform

The deprecation of Ingress-NGINX is a wake-up call. The next major change is a matter of “when,” not “if.” By investing in a structured migration framework now, platform teams transform a potential crisis into a manageable, repeatable operational procedure.

This framework—Assess, Run in Parallel, Validate, and Decommission—abstracts the specific lessons from the ingress migration into a generic pattern. It can be applied to migrating from PodSecurityPolicies to Pod Security Standards, from a deprecated CSI driver, or from one service mesh to another. The tools (GitOps, canary deployments, observability) are already in your stack. The value is in stitching them together into a disciplined process that ensures platform evolution doesn’t compromise platform stability.

Start by documenting this framework as a runbook template. Then, apply it to your next significant component update, even a minor one, to refine the process. When the next major deprecation announcement lands in your inbox, you’ll be ready.

Kubernetes Housekeeping: How to Clean Up Orphaned ConfigMaps and Secrets

Kubernetes Housekeeping: How to Clean Up Orphaned ConfigMaps and Secrets

If you’ve been running Kubernetes clusters for any meaningful amount of time, you’ve likely encountered a familiar problem: orphaned ConfigMaps and Secrets piling up in your namespaces. These abandoned resources don’t just clutter your cluster—they introduce security risks, complicate troubleshooting, and can even impact cluster performance as your resource count grows.

The reality is that Kubernetes doesn’t automatically clean up ConfigMaps and Secrets when the workloads that reference them are deleted. This gap in Kubernetes’ native garbage collection creates a housekeeping problem that every production cluster eventually faces. In this article, we’ll explore why orphaned resources happen, how to detect them, and most importantly, how to implement sustainable cleanup strategies that prevent them from accumulating in the first place.

Understanding the Orphaned Resource Problem

What Are Orphaned ConfigMaps and Secrets?

Orphaned ConfigMaps and Secrets are configuration resources that no longer have any active references from Pods, Deployments, StatefulSets, or other workload resources in your cluster. They typically become orphaned when:

  • Applications are updated and new ConfigMaps are created while old ones remain
  • Deployments are deleted but their associated configuration resources aren’t
  • Failed rollouts leave behind unused configuration versions
  • Development and testing workflows create temporary resources that never get cleaned up
  • CI/CD pipelines generate unique ConfigMap names (often with hash suffixes) on each deployment

Why This Matters for Production Clusters

While a few orphaned ConfigMaps might seem harmless, the problem compounds over time and introduces real operational challenges:

Security Risks: Orphaned Secrets can contain outdated credentials, API keys, or certificates that should no longer be accessible. If these aren’t removed, they remain attack vectors for unauthorized access—especially problematic if RBAC policies grant broad read access to Secrets within a namespace.

Cluster Bloat: Kubernetes stores these resources in etcd, your cluster’s backing store. As the number of orphaned resources grows, etcd size increases, potentially impacting cluster performance and backup times. In extreme cases, this can contribute to etcd performance degradation or even hit storage quotas.

Operational Complexity: When troubleshooting issues or reviewing configurations, sifting through dozens of unused ConfigMaps makes it harder to identify which resources are actually in use. This “configuration noise” slows down incident response and increases cognitive load for your team.

Cost Implications: While individual ConfigMaps are small, at scale they contribute to storage costs and can trigger alerts in cost monitoring systems, especially in multi-tenant environments where resource quotas matter.

Detecting Orphaned ConfigMaps and Secrets

Before you can clean up orphaned resources, you need to identify them. Let’s explore both manual detection methods and automated tooling approaches.

Manual Detection with kubectl

The simplest approach uses kubectl to cross-reference ConfigMaps and Secrets against active workload resources. Here’s a basic script to identify potentially orphaned ConfigMaps:

#!/bin/bash
# detect-orphaned-configmaps.sh
# Identifies ConfigMaps not referenced by any active Pods

NAMESPACE=${1:-default}

echo "Checking for orphaned ConfigMaps in namespace: $NAMESPACE"
echo "---"

# Get all ConfigMaps in the namespace
CONFIGMAPS=$(kubectl get configmaps -n $NAMESPACE -o jsonpath='{.items[*].metadata.name}')

for cm in $CONFIGMAPS; do
    # Skip kube-root-ca.crt as it's system-managed
    if [[ "$cm" == "kube-root-ca.crt" ]]; then
        continue
    fi

    # Check if any Pod references this ConfigMap
    REFERENCED=$(kubectl get pods -n $NAMESPACE -o json | \
        jq -r --arg cm "$cm" '.items[] |
        select(
            (.spec.volumes[]?.configMap.name == $cm) or
            (.spec.containers[].env[]?.valueFrom.configMapKeyRef.name == $cm) or
            (.spec.containers[].envFrom[]?.configMapRef.name == $cm)
        ) | .metadata.name' | head -1)

    if [[ -z "$REFERENCED" ]]; then
        echo "Orphaned: $cm"
    fi
done

A similar script for Secrets would look like this:

#!/bin/bash
# detect-orphaned-secrets.sh

NAMESPACE=${1:-default}

echo "Checking for orphaned Secrets in namespace: $NAMESPACE"
echo "---"

SECRETS=$(kubectl get secrets -n $NAMESPACE -o jsonpath='{.items[*].metadata.name}')

for secret in $SECRETS; do
    # Skip service account tokens and system secrets
    SECRET_TYPE=$(kubectl get secret $secret -n $NAMESPACE -o jsonpath='{.type}')
    if [[ "$SECRET_TYPE" == "kubernetes.io/service-account-token" ]]; then
        continue
    fi

    # Check if any Pod references this Secret
    REFERENCED=$(kubectl get pods -n $NAMESPACE -o json | \
        jq -r --arg secret "$secret" '.items[] |
        select(
            (.spec.volumes[]?.secret.secretName == $secret) or
            (.spec.containers[].env[]?.valueFrom.secretKeyRef.name == $secret) or
            (.spec.containers[].envFrom[]?.secretRef.name == $secret) or
            (.spec.imagePullSecrets[]?.name == $secret)
        ) | .metadata.name' | head -1)

    if [[ -z "$REFERENCED" ]]; then
        echo "Orphaned: $secret"
    fi
done

Important caveat: These scripts only check currently running Pods. They won’t catch ConfigMaps or Secrets referenced by Deployments, StatefulSets, or DaemonSets that might currently have zero replicas. For production use, you’ll want to check against all workload resource types.

Automated Detection with Specialized Tools

Several open-source tools have emerged to solve this problem more comprehensively:

Kor: Comprehensive Unused Resource Detection

Kor is a purpose-built tool for finding unused resources across your Kubernetes cluster. It checks not just ConfigMaps and Secrets, but also PVCs, Services, and other resource types.

# Install Kor
brew install kor

# Scan for unused ConfigMaps and Secrets
kor all --namespace production --output json

# Check specific resource types
kor configmap --namespace production
kor secret --namespace production --exclude-namespaces kube-system,kube-public

Kor works by analyzing resource relationships and identifying anything without dependent objects. It’s particularly effective because it understands Kubernetes resource hierarchies and checks against Deployments, StatefulSets, and DaemonSets—not just running Pods.

Popeye: Cluster Sanitization Reports

Popeye scans your cluster and generates reports on resource health, including orphaned resources. While broader in scope than just ConfigMap cleanup, it provides valuable context:

# Install Popeye
brew install derailed/popeye/popeye

# Scan cluster
popeye --output json --save

# Focus on specific namespace
popeye --namespace production

Custom Controllers with Kubernetes APIs

For more sophisticated detection, you can build custom controllers using client-go that continuously monitor for orphaned resources. This approach works well when integrated with your existing observability stack:

// Pseudocode example
func detectOrphanedConfigMaps(namespace string) []string {
    configMaps := listConfigMaps(namespace)
    deployments := listDeployments(namespace)
    statefulSets := listStatefulSets(namespace)
    daemonSets := listDaemonSets(namespace)

    referenced := make(map[string]bool)

    // Check all workload types for ConfigMap references
    for _, deploy := range deployments {
        for _, cm := range getReferencedConfigMaps(deploy) {
            referenced[cm] = true
        }
    }
    // ... repeat for other workload types

    orphaned := []string{}
    for _, cm := range configMaps {
        if !referenced[cm.Name] {
            orphaned = append(orphaned, cm.Name)
        }
    }

    return orphaned
}

Prevention Strategies: Stop Orphans Before They Start

The best cleanup strategy is prevention. By implementing proper resource management patterns from the beginning, you can minimize orphaned resources in the first place.

Use Owner References for Automatic Cleanup

Kubernetes provides a built-in mechanism for resource lifecycle management through owner references. When properly configured, child resources are automatically deleted when their owner is removed.

apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
  namespace: production
  ownerReferences:
    - apiVersion: apps/v1
      kind: Deployment
      name: myapp
      uid: d9607e19-f88f-11e6-a518-42010a800195
      controller: true
      blockOwnerDeletion: true
data:
  app.properties: |
    database.url=postgres://db:5432

Tools like Helm and Kustomize automatically set owner references, which is one reason GitOps workflows tend to have fewer orphaned resources than imperative deployment approaches.

Implement Consistent Labeling Standards

Labels make it much easier to identify resource relationships and track ownership:

apiVersion: v1
kind: ConfigMap
metadata:
  name: api-gateway-config-v2
  labels:
    app: api-gateway
    component: configuration
    version: v2
    managed-by: argocd
    owner: platform-team
data:
  config.yaml: |
    # configuration here

With consistent labeling, you can easily query for ConfigMaps associated with specific applications:

# Find all ConfigMaps for a specific app
kubectl get configmaps -l app=api-gateway

# Clean up old versions
kubectl delete configmaps -l app=api-gateway,version=v1

Adopt GitOps Practices

GitOps tools like ArgoCD and Flux excel at preventing orphaned resources because they maintain a clear desired state:

  • Declarative management: All resources are defined in Git
  • Automatic pruning: Tools can detect and remove resources not defined in Git
  • Audit trail: Git history shows when and why resources were created or deleted

ArgoCD’s sync policies can automatically prune resources:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: myapp
spec:
  syncPolicy:
    automated:
      prune: true  # Remove resources not in Git
      selfHeal: true

Use Kustomize ConfigMap Generators with Hashes

Kustomize’s ConfigMap generator feature appends content hashes to ConfigMap names, ensuring that configuration changes trigger new ConfigMaps:

# kustomization.yaml
configMapGenerator:
  - name: app-config
    files:
      - config.properties
generatorOptions:
  disableNameSuffixHash: false  # Include hash in name

This creates ConfigMaps like app-config-dk9g72hk5f. When you update the configuration, Kustomize creates a new ConfigMap with a different hash. Combined with Kustomize’s --prune flag, old ConfigMaps are automatically removed:

kubectl apply --prune -k ./overlays/production \
  -l app=myapp

Set Resource Quotas

While quotas don’t prevent orphans, they create backpressure that forces teams to clean up:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: config-quota
  namespace: production
spec:
  hard:
    configmaps: "50"
    secrets: "50"

When teams hit quota limits, they’re incentivized to audit and remove unused resources.

Cleanup Strategies for Existing Orphaned Resources

For clusters that already have accumulated orphaned ConfigMaps and Secrets, here are practical cleanup approaches.

One-Time Manual Cleanup

For immediate cleanup, combine detection scripts with kubectl delete:

# Dry run first - review what would be deleted
./detect-orphaned-configmaps.sh production > orphaned-cms.txt
cat orphaned-cms.txt

# Manual review and cleanup
for cm in $(cat orphaned-cms.txt | grep "Orphaned:" | awk '{print $2}'); do
    kubectl delete configmap $cm -n production
done

Critical warning: Always do a dry run and manual review first. Some ConfigMaps might be referenced by workloads that aren’t currently running but will scale up later (HPA scaled to zero, CronJobs, etc.).

Scheduled Cleanup with CronJobs

For ongoing maintenance, deploy a Kubernetes CronJob that runs cleanup scripts periodically:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: configmap-cleanup
  namespace: kube-system
spec:
  schedule: "0 2 * * 0"  # Weekly at 2 AM Sunday
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 3
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: cleanup-sa
          containers:
          - name: cleanup
            image: bitnami/kubectl:latest
            command:
            - /bin/bash
            - -c
            - |
              # Cleanup script here
              echo "Starting ConfigMap cleanup..."

              for ns in $(kubectl get ns -o jsonpath='{.items[*].metadata.name}'); do
                echo "Checking namespace: $ns"

                # Get all workload-referenced ConfigMaps
                REFERENCED_CMS=$(kubectl get deploy,sts,ds -n $ns -o json | \
                  jq -r '.items[].spec.template.spec |
                  [.volumes[]?.configMap.name,
                   .containers[].env[]?.valueFrom.configMapKeyRef.name,
                   .containers[].envFrom[]?.configMapRef.name] |
                  .[] | select(. != null)' | sort -u)

                ALL_CMS=$(kubectl get cm -n $ns -o jsonpath='{.items[*].metadata.name}')

                for cm in $ALL_CMS; do
                  if [[ "$cm" == "kube-root-ca.crt" ]]; then
                    continue
                  fi

                  if ! echo "$REFERENCED_CMS" | grep -q "^$cm$"; then
                    echo "Deleting orphaned ConfigMap: $cm in namespace: $ns"
                    kubectl delete cm $cm -n $ns
                  fi
                done
              done
          restartPolicy: OnFailure
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: cleanup-sa
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cleanup-role
rules:
- apiGroups: [""]
  resources: ["configmaps", "secrets", "namespaces"]
  verbs: ["get", "list", "delete"]
- apiGroups: ["apps"]
  resources: ["deployments", "statefulsets", "daemonsets"]
  verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: cleanup-binding
subjects:
- kind: ServiceAccount
  name: cleanup-sa
  namespace: kube-system
roleRef:
  kind: ClusterRole
  name: cleanup-role
  apiGroup: rbac.authorization.k8s.io

Security consideration: This CronJob needs cluster-wide permissions to read workloads and delete ConfigMaps. Review and adjust the RBAC permissions based on your security requirements. Consider limiting to specific namespaces if you don’t need cluster-wide cleanup.

Integration with CI/CD Pipelines

Build cleanup into your deployment workflows. Here’s an example GitLab CI job:

cleanup_old_configs:
  stage: post-deploy
  image: bitnami/kubectl:latest
  script:
    - |
      # Delete ConfigMaps with old version labels after successful deployment
      kubectl delete configmap -n production \
        -l app=myapp,version!=v${CI_COMMIT_TAG}

    - |
      # Keep only the last 3 ConfigMap versions by timestamp
      kubectl get configmap -n production \
        -l app=myapp \
        --sort-by=.metadata.creationTimestamp \
        -o name | head -n -3 | xargs -r kubectl delete -n production
  only:
    - tags
  when: on_success

Safe Deletion Practices

When cleaning up ConfigMaps and Secrets, follow these safety guidelines:

  1. Dry run first: Always review what will be deleted before executing
  2. Backup before deletion: Export resources to YAML files before removing them
  3. Check age: Only delete resources older than a certain threshold (e.g., 30 days)
  4. Exclude system resources: Skip kube-system, kube-public, and other system namespaces
  5. Monitor for impact: Watch application metrics after cleanup to ensure nothing broke

Example backup and conditional deletion:

# Backup before deletion
kubectl get configmap -n production -o yaml > cm-backup-$(date +%Y%m%d).yaml

# Only delete ConfigMaps older than 30 days
kubectl get configmap -n production -o json | \
  jq -r --arg date "$(date -d '30 days ago' -u +%Y-%m-%dT%H:%M:%SZ)" \
  '.items[] | select(.metadata.creationTimestamp < $date) | .metadata.name' | \
  while read cm; do
    echo "Would delete: $cm (created: $(kubectl get cm $cm -n production -o jsonpath='{.metadata.creationTimestamp}'))"
    # Uncomment to actually delete:
    # kubectl delete configmap $cm -n production
  done

Advanced Patterns for Large-Scale Clusters

For organizations running multiple clusters or large multi-tenant platforms, housekeeping requires more sophisticated approaches.

Policy-Based Cleanup with OPA Gatekeeper

Use OPA Gatekeeper to enforce ConfigMap lifecycle policies at admission time:

apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
  name: configmaprequiredlabels
spec:
  crd:
    spec:
      names:
        kind: ConfigMapRequiredLabels
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package configmaprequiredlabels

        violation[{"msg": msg}] {
          input.review.kind.kind == "ConfigMap"
          not input.review.object.metadata.labels["app"]
          msg := "ConfigMaps must have an 'app' label for lifecycle tracking"
        }

        violation[{"msg": msg}] {
          input.review.kind.kind == "ConfigMap"
          not input.review.object.metadata.labels["owner"]
          msg := "ConfigMaps must have an 'owner' label for lifecycle tracking"
        }

This policy prevents ConfigMaps without proper labels from being created, making future tracking and cleanup much easier.

Centralized Monitoring with Prometheus

Monitor orphaned resource metrics across your clusters:

apiVersion: v1
kind: ConfigMap
metadata:
  name: orphan-detection-exporter
data:
  script.sh: |
    #!/bin/bash
    # Expose metrics for Prometheus scraping
    while true; do
      echo "# HELP k8s_orphaned_configmaps Number of orphaned ConfigMaps"
      echo "# TYPE k8s_orphaned_configmaps gauge"

      for ns in $(kubectl get ns -o jsonpath='{.items[*].metadata.name}'); do
        count=$(./detect-orphaned-configmaps.sh $ns | grep -c "Orphaned:")
        echo "k8s_orphaned_configmaps{namespace=\"$ns\"} $count"
      done

      sleep 300  # Update every 5 minutes
    done

Create alerts when orphaned resource counts exceed thresholds:

groups:
- name: kubernetes-housekeeping
  rules:
  - alert: HighOrphanedConfigMapCount
    expr: k8s_orphaned_configmaps > 20
    for: 24h
    labels:
      severity: warning
    annotations:
      summary: "High number of orphaned ConfigMaps in {{ $labels.namespace }}"
      description: "Namespace {{ $labels.namespace }} has {{ $value }} orphaned ConfigMaps"

Multi-Cluster Cleanup with Crossplane or Cluster API

For platform teams managing dozens or hundreds of clusters, extend cleanup automation across your entire fleet:

# Crossplane Composition for cluster-wide cleanup
apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
  name: cluster-cleanup-policy
spec:
  compositeTypeRef:
    apiVersion: platform.example.com/v1
    kind: ClusterCleanupPolicy
  resources:
    - name: cleanup-cronjob
      base:
        apiVersion: kubernetes.crossplane.io/v1alpha1
        kind: Object
        spec:
          forProvider:
            manifest:
              apiVersion: batch/v1
              kind: CronJob
              # ... CronJob spec from earlier

Housekeeping Checklist for Production Clusters

Here’s a practical checklist to implement sustainable ConfigMap and Secret housekeeping:

Immediate Actions:

  • [ ] Run detection scripts to audit current orphaned resource count
  • [ ] Backup all ConfigMaps and Secrets before any cleanup
  • [ ] Manually review and delete obvious orphans (with team approval)
  • [ ] Document which ConfigMaps/Secrets are intentionally unused but needed

Short-term (1-4 weeks):

  • [ ] Implement consistent labeling standards across teams
  • [ ] Add owner references to all ConfigMaps and Secrets
  • [ ] Deploy scheduled CronJob for automated detection and reporting
  • [ ] Integrate cleanup steps into CI/CD pipelines

Long-term (1-3 months):

  • [ ] Adopt GitOps tooling (ArgoCD, Flux) with automated pruning
  • [ ] Implement OPA Gatekeeper policies for required labels
  • [ ] Set up Prometheus monitoring for orphaned resource metrics
  • [ ] Create runbooks for incident responders
  • [ ] Establish resource quotas per namespace
  • [ ] Conduct quarterly cluster hygiene reviews

Ongoing Practices:

  • [ ] Review orphaned resource reports weekly
  • [ ] Include cleanup tasks in sprint planning
  • [ ] Train new team members on resource lifecycle best practices
  • [ ] Update cleanup automation as cluster architecture evolves

Conclusion

Kubernetes doesn’t automatically clean up orphaned ConfigMaps and Secrets, but with the right strategies, you can prevent them from becoming a problem. The key is implementing a layered approach: use owner references and GitOps for prevention, deploy automated detection for ongoing monitoring, and run scheduled cleanup jobs for maintenance.

Start with detection to understand your current situation, then focus on prevention strategies like owner references and consistent labeling. For existing clusters with accumulated orphaned resources, implement gradual cleanup with proper safety checks rather than aggressive bulk deletion.

Remember that housekeeping isn’t a one-time task—it’s an ongoing operational practice. By building cleanup into your CI/CD pipelines and establishing clear resource ownership, you’ll maintain a clean, secure, and performant Kubernetes environment over time.

The tools and patterns we’ve covered here—from simple bash scripts to sophisticated policy engines—can be adapted to your organization’s scale and maturity level. Whether you’re managing a single cluster or a multi-cluster platform, investing in proper resource lifecycle management pays dividends in operational efficiency, security posture, and team productivity.

Frequently Asked Questions (FAQ)

Can Kubernetes automatically delete unused ConfigMaps and Secrets?

No. Kubernetes does not garbage-collect ConfigMaps or Secrets by default when workloads are deleted. Unless they have ownerReferences set, these resources remain in the cluster indefinitely and must be cleaned up manually or via automation.

Is it safe to delete ConfigMaps or Secrets that are not referenced by running Pods?

Not always. Some resources may be referenced by workloads scaled to zero, CronJobs, or future rollouts. Always perform a dry run, check workload definitions (Deployments, StatefulSets, DaemonSets), and review resource age before deletion.

What is the safest way to prevent orphaned ConfigMaps and Secrets?

The most effective prevention strategies are:
Using ownerReferences (via Helm or Kustomize)
Adopting GitOps with pruning enabled (ArgoCD / Flux)
Applying consistent labeling (app, owner, version)
These ensure unused resources are automatically detected and removed

Which tools are best for detecting orphaned resources?

Popular and reliable tools include:
Kor – purpose-built for detecting unused Kubernetes resources
Popeye – broader cluster hygiene and sanitization reports
Custom scripts/controllers – useful for tailored environments or integrations
For production clusters, Kor provides the best signal-to-noise ratio.

How often should ConfigMap and Secret cleanup run in production?

A common best practice is:
Weekly detection (reporting only)
Monthly cleanup for resources older than a defined threshold (e.g. 30–60 days)
Immediate cleanup integrated into CI/CD after successful deployments
This balances safety with long-term cluster hygiene.

Sources