Prometheus 3.0 and OpenTelemetry: Native OTLP Support Explained

Prometheus 3.0 and OpenTelemetry: Native OTLP Support Explained

Seven years is a long time in observability. Since Prometheus 2.0 landed in 2017, the ecosystem has been transformed by cloud-native adoption, the rise of distributed tracing, and the emergence of OpenTelemetry as the de facto standard for instrumentation. Prometheus 3.0, released in November 2024, is the project’s answer to that transformation — and its most significant change is the native ability to ingest OpenTelemetry metrics directly, without an intermediary collector standing in the way.

This article goes deep on what Prometheus 3.0 actually changes for platform engineers and cloud architects who are running — or planning to run — OTel-instrumented workloads alongside Prometheus-based monitoring stacks. We will cover the native OTLP ingestion endpoint, UTF-8 metric name support, Remote Write 2.0, migration considerations, and the architectural patterns that still make sense even when native OTLP is available.

What Changed in Prometheus 3.0: The OTel-Relevant Picture

Prometheus 3.0 ships a substantial set of changes. Not all of them are equally relevant to OpenTelemetry integration, so let’s focus on what actually moves the needle for OTel users before diving into each area in detail.

Native OTLP Ingestion

The flagship feature: Prometheus 3.0 ships with a built-in OTLP receiver that exposes an HTTP endpoint accepting metrics in the OpenTelemetry Protocol format. Applications instrumented with any OTel SDK can now push metrics directly to Prometheus without routing through an OpenTelemetry Collector. This is not a sidecar, not a plugin, not an external adapter — it is a first-class endpoint in the Prometheus binary itself.

UTF-8 Metric Names

Prometheus historically restricted metric names to [a-zA-Z_:][a-zA-Z0-9_:]*. OpenTelemetry uses dots and slashes in metric names by convention — http.server.request.duration is a canonical OTel metric name. Prometheus 3.0 lifts this restriction and supports arbitrary UTF-8 characters in metric names and label names, which is the single most important compatibility change for OTel interoperability.

Remote Write 2.0

Remote Write 2.0

Remote Write 2.0 replaces the original protocol with a more efficient encoding based on protobuf, adds native histogram support in the wire format, and reduces bandwidth consumption significantly for large-scale deployments. If you are federating metrics to Thanos, Mimir, or Cortex, this matters for operational cost.

New UI

The Prometheus web UI has been completely rewritten. The new UI uses React, supports metric metadata exploration, and provides a significantly improved query-building experience. This is a quality-of-life improvement rather than an architectural change, but it reduces the dependency on external tools like Grafana for ad-hoc investigation.

Breaking Changes Summary

Prometheus 3.0 removes several features that were deprecated in 2.x. The most operationally significant are: removal of the --web.enable-admin-api deprecated flag path, removal of certain legacy storage format options, changes to default scrape timeouts, and stricter validation of configuration that was previously silently accepted. We cover a migration checklist later in this article.

The OTLP Receiver: How It Works and What It Accepts

The OTLP receiver in Prometheus 3.0 is implemented as an optional feature that must be explicitly enabled. Once enabled, it exposes an HTTP endpoint at /api/v1/otlp/v1/metrics that accepts protobuf-encoded OTLP ExportMetricsServiceRequest payloads — the same wire format used by the OpenTelemetry Collector’s OTLP exporter.

What It Accepts (and What It Does Not)

This is critical to understand before you architect around native OTLP ingestion: Prometheus 3.0 OTLP support is metrics-only. It does not accept traces or logs. OTLP is a unified protocol covering all three signals, but Prometheus is a metrics store — the receiver handles only the metrics portion of the OTLP specification.

Supported metric types in the OTLP receiver:

  • Gauge — maps directly to a Prometheus Gauge
  • Sum (monotonic) — maps to a Prometheus Counter
  • Sum (non-monotonic) — maps to a Prometheus Gauge
  • Histogram (explicit bucket) — maps to a Prometheus Histogram
  • ExponentialHistogram — maps to Prometheus Native Histograms (a 3.0 feature)
  • Summary — maps to a Prometheus Summary

Resource attributes from the OTLP payload — things like service.name, k8s.pod.name, cloud.region — are converted to Prometheus labels. This conversion is configurable, and by default Prometheus applies a promotion strategy that converts the most common resource attributes to labels while discarding ones that would create extremely high cardinality.

Enabling the OTLP Receiver

Enabling native OTLP ingestion requires two things: a feature flag and a configuration block in prometheus.yml.

Start the Prometheus binary with the feature flag:

prometheus \
  --config.file=/etc/prometheus/prometheus.yml \
  --enable-feature=otlp-write-receiver

Then add the OTLP receiver configuration to your prometheus.yml:

# prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

otlp:
  # Promote these OTLP resource attributes to Prometheus labels
  promote_resource_attributes:
    - service.name
    - service.namespace
    - service.instance.id
    - k8s.namespace.name
    - k8s.pod.name
    - k8s.node.name
    - cloud.region
    - deployment.environment

With this configuration, Prometheus will listen on port 9090 (default) and accept OTLP metrics at http://<prometheus-host>:9090/api/v1/otlp/v1/metrics.

Resource Attribute Promotion Strategy

The promote_resource_attributes list deserves careful thought. OTLP carries rich resource-level context — every metric payload includes a ResourceMetrics object with attributes describing the source: service name, version, environment, Kubernetes pod, node, cluster, cloud provider details, and more. Prometheus labels are flat key-value pairs on each time series. Promoting too many resource attributes explodes cardinality; promoting too few loses important context.

A pragmatic starting list for Kubernetes deployments:

otlp:
  promote_resource_attributes:
    - service.name          # Critical: identifies the service
    - service.namespace     # Logical grouping
    - deployment.environment  # prod/staging/dev
    - k8s.namespace.name    # Kubernetes namespace
    - k8s.pod.name          # Pod-level cardinality — consider omitting in high-scale
    - k8s.node.name         # Useful for infrastructure correlation

Avoid blindly promoting k8s.pod.name at scale — in a cluster with thousands of short-lived pods, this creates significant cardinality pressure. Prefer service.name and service.namespace for most alerting use cases, reserving pod-level labels for debugging dashboards.

UTF-8 Metric Names: Why This Is the Real Game-Changer

To appreciate why UTF-8 metric name support matters so much, you need to understand the friction it eliminates. OpenTelemetry semantic conventions define metric names using dots as namespace separators. The canonical HTTP server duration metric is http.server.request.duration. The canonical database query duration is db.client.operation.duration. These names are standardized across languages and frameworks — your Go service and your Java service and your Python service all emit the same metric name when instrumented with OTel.

Prometheus 2.x could not store these names. The dots are illegal characters in Prometheus metric naming. Every OTel-to-Prometheus bridge — the OpenTelemetry Collector’s Prometheus exporter, prom-client compatibility layers, the older prometheusremotewrite exporter — had to translate these names, typically by replacing dots with underscores: http_server_request_duration.

This translation is lossy and creates multiple problems:

  • Name collisions: http.server.request_duration and http.server.request.duration both become http_server_request_duration
  • Dashboard breakage: Grafana dashboards built against OTel semantic conventions don’t work against translated Prometheus metrics without modification
  • Cross-signal correlation: Trace attributes use dot notation; when metric names differ, automated correlation tools lose the thread
  • Vendor lock-in pressure: Teams end up with separate naming conventions for “Prometheus metrics” vs “OTel metrics” and maintain both

Prometheus 3.0 with UTF-8 support stores http.server.request.duration natively. No translation. No collision. The metric name you instrument with is the metric name you query.

Enabling UTF-8 Metric Names

UTF-8 metric names require the utf8-names feature flag:

prometheus \
  --config.file=/etc/prometheus/prometheus.yml \
  --enable-feature=utf8-names \
  --enable-feature=otlp-write-receiver

Once enabled, PromQL queries must use quoted metric names when the name contains characters outside the legacy character set:

# Legacy metric name — unquoted works fine
http_server_requests_total

# OTel metric name with dots — requires quoting in PromQL
{"__name__"="http.server.request.duration"}

# Or using the new PromQL syntax in Prometheus 3.0
http.server.request.duration{service_name="api-gateway"}

The PromQL parser in Prometheus 3.0 has been updated to handle quoted metric names as a first-class construct. Grafana’s PromQL engine has also been updated to handle this syntax — verify your Grafana version (10.3+ has full support) before deploying.

OTel SDK to Prometheus 3.0 Directly: No Collector Required

For teams that only need to get application metrics into Prometheus, native OTLP ingestion enables a dramatically simpler architecture. Here’s what it looks like with different OTel SDKs.

Go (OpenTelemetry SDK)

package main

import (
    "context"
    "time"

    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetrichttp"
    "go.opentelemetry.io/otel/sdk/metric"
    "go.opentelemetry.io/otel/sdk/resource"
    semconv "go.opentelemetry.io/otel/semconv/v1.26.0"
)

func initMetrics(ctx context.Context) (*metric.MeterProvider, error) {
    res, err := resource.New(ctx,
        resource.WithAttributes(
            semconv.ServiceName("my-api"),
            semconv.ServiceNamespace("platform"),
            semconv.DeploymentEnvironment("production"),
        ),
    )
    if err != nil {
        return nil, err
    }

    // Point directly at Prometheus 3.0 OTLP endpoint
    exporter, err := otlpmetrichttp.New(ctx,
        otlpmetrichttp.WithEndpoint("prometheus:9090"),
        otlpmetrichttp.WithURLPath("/api/v1/otlp/v1/metrics"),
        otlpmetrichttp.WithInsecure(), // Use WithTLSClientConfig for production
    )
    if err != nil {
        return nil, err
    }

    provider := metric.NewMeterProvider(
        metric.WithResource(res),
        metric.WithReader(
            metric.NewPeriodicReader(exporter,
                metric.WithInterval(30*time.Second),
            ),
        ),
    )

    otel.SetMeterProvider(provider)
    return provider, nil
}

Python (OpenTelemetry SDK)

from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.exporter.otlp.proto.http.metric_exporter import OTLPMetricExporter
from opentelemetry.sdk.resources import Resource, SERVICE_NAME, SERVICE_NAMESPACE

resource = Resource.create({
    SERVICE_NAME: "my-api",
    SERVICE_NAMESPACE: "platform",
    "deployment.environment": "production",
})

exporter = OTLPMetricExporter(
    endpoint="http://prometheus:9090/api/v1/otlp/v1/metrics",
)

reader = PeriodicExportingMetricReader(
    exporter,
    export_interval_millis=30_000,
)

provider = MeterProvider(resource=resource, metric_readers=[reader])
metrics.set_meter_provider(provider)

# Use the meter
meter = metrics.get_meter("my-api")
request_counter = meter.create_counter(
    name="http.server.request.count",
    description="Total HTTP server requests",
    unit="1",
)
request_duration = meter.create_histogram(
    name="http.server.request.duration",
    description="HTTP server request duration",
    unit="s",
)

Java (OpenTelemetry SDK with Spring Boot)

# application.properties (Spring Boot with OTel auto-instrumentation)
otel.service.name=my-api
otel.resource.attributes=service.namespace=platform,deployment.environment=production

# Configure OTLP exporter to push directly to Prometheus
otel.metrics.exporter=otlp
otel.exporter.otlp.metrics.endpoint=http://prometheus:9090/api/v1/otlp/v1/metrics
otel.exporter.otlp.metrics.protocol=http/protobuf

# Export interval
otel.metric.export.interval=30000

With Spring Boot and the OTel Java agent, no code changes are required beyond configuration — the agent instruments your HTTP server, database clients, and messaging systems automatically and pushes metrics using the names defined in OTel semantic conventions.

OTel Collector to Prometheus 3.0: When You Need the Intermediary

Native OTLP ingestion is compelling, but the OpenTelemetry Collector remains relevant for a significant set of use cases. Understanding when each pattern is appropriate is the core architectural decision you will face when adopting Prometheus 3.0 in an OTel environment.

Pattern 1: OTel Collector as Fan-Out Gateway

When you need to send metrics to multiple backends simultaneously — Prometheus for alerting, a long-term store like Thanos for historical analysis, and a commercial observability platform for full-stack correlation — the OTel Collector handles fan-out efficiently. Applications push once to the Collector; the Collector distributes to all backends.

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 10s
    send_batch_size: 1000
  memory_limiter:
    check_interval: 1s
    limit_mib: 512

exporters:
  # Push to Prometheus 3.0 via OTLP
  otlphttp/prometheus:
    endpoint: http://prometheus:9090/api/v1/otlp
    tls:
      insecure: true

  # Fan-out to Thanos via remote_write
  prometheusremotewrite/thanos:
    endpoint: http://thanos-receive:10908/api/v1/receive
    resource_to_telemetry_conversion:
      enabled: true

  # Fan-out to commercial backend
  otlp/datadog:
    endpoint: https://otel-intake.datadoghq.com
    headers:
      DD-API-KEY: "${DD_API_KEY}"

service:
  pipelines:
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [otlphttp/prometheus, prometheusremotewrite/thanos, otlp/datadog]

Pattern 2: Collector for Metric Transformation

The OTel Collector’s transform processor and metricstransform processor allow you to reshape metrics before they reach Prometheus: rename labels, add static attributes, filter out high-cardinality series, aggregate metrics to reduce storage cost, or apply unit conversions. These operations are not available in Prometheus’s native OTLP receiver.

processors:
  transform/metrics:
    metric_statements:
      - context: metric
        statements:
          # Drop internal debug metrics
          - delete_matching_keys(attributes, "internal.*")
          # Normalize environment label values
          - set(attributes["deployment.environment"], "prod")
            where attributes["deployment.environment"] == "production"

  filter/drop_debug:
    metrics:
      exclude:
        match_type: regexp
        metric_names:
          - ".*\.debug\..*"
          - "runtime\.go\.internal\..*"

  metricstransform:
    transforms:
      # Rename a metric to match your existing Prometheus naming convention
      - include: http.server.request.duration
        action: update
        new_name: http_server_request_duration_seconds

Pattern 3: Collector for Traces and Logs (Always Required)

If your architecture includes traces and logs alongside metrics — and in 2025 it almost certainly does — you need an OTel Collector regardless of what you do with metrics. Prometheus does not accept traces or logs. Jaeger, Tempo, and Loki all have their own ingestion protocols. The Collector is the universal routing layer for the three pillars of observability.

In this architecture, it is usually simpler to route all three signals through the Collector and let it push metrics to Prometheus via OTLP or remote_write, rather than splitting metrics to go directly and everything else through the Collector.

When to Use Native OTLP vs. OTel Collector: Decision Framework

ScenarioNative OTLPOTel Collector
Single metrics backend (Prometheus only)PreferredOverkill
Multiple metrics backendsNot sufficientRequired
Traces + Logs in scopeNot applicableRequired
Metric transformation/filtering neededNot supportedRequired
Simple Kubernetes-native deploymentPreferredAdditional complexity
Air-gapped / constrained environmentsPreferred (fewer components)Consider carefully
Mixed OTel + legacy Prometheus targetsWorks alongside scrapingCan normalize naming
High-volume, need batching/bufferingLimited controlPreferred

The pragmatic recommendation for most platform engineering teams: if you are already running the OTel Collector (and you should be if traces are in scope), continue routing metrics through it. Use the Collector’s otlphttp exporter to push to Prometheus 3.0. Reserve the direct SDK-to-Prometheus pattern for simple services where the Collector would be the only reason to add complexity.

Remote Write 2.0: What Changes for Existing Setups

Remote Write 2.0 is a significant protocol upgrade with real operational implications for teams using Prometheus as a metrics source for long-term storage systems like Thanos, Mimir, VictoriaMetrics, or Cortex.

Key Protocol Changes

  • Protobuf encoding with snappy compression — replacing the previous text-based format. Typically 50-70% reduction in wire size for large metric batches
  • Native histogram support in the wire format — exponential histograms can now be forwarded without converting to classic histograms, preserving full resolution
  • Metadata forwarding — metric type and unit information is now transmitted alongside samples, enabling better downstream processing
  • Created timestamps — the timestamp at which a counter was created is forwarded, enabling more accurate rate calculations across restarts

Configuring Remote Write 2.0

# prometheus.yml
remote_write:
  - url: "http://thanos-receive:10908/api/v1/receive"
    # Remote Write 2.0 is negotiated automatically with compatible receivers
    # Force RW2.0 explicitly if needed:
    send_native_histograms: true
    metadata_config:
      send: true
      send_interval: 1m
    queue_config:
      capacity: 10000
      max_shards: 200
      max_samples_per_send: 2000
      batch_send_deadline: 5s

Remote Write 2.0 uses protocol content negotiation — Prometheus 3.0 will attempt RW2.0 first and fall back to RW1.0 if the receiver does not support it. This means upgrades are generally backward-compatible. Verify that your receiving system (Thanos Receive 0.35+, Mimir 2.12+, VictoriaMetrics 1.98+) supports RW2.0 before expecting the benefits.

Migration from Prometheus 2.x: Breaking Changes Checklist

Upgrading from Prometheus 2.x to 3.0 requires attention to several breaking changes. This checklist covers the operationally significant ones for teams running production Prometheus deployments.

Configuration Changes

  • Removed: query.lookback-delta default change — the default changed from 5 minutes to match the scrape interval. Queries that relied on the 5m default may return different results. Audit alerting rules that use instant queries on counters.
  • Removed: deprecated remote_write optionsremote_write[].queue_config.capacity semantics changed. Review and update queue configurations.
  • Removed: storage.tsdb.allow-overlapping-blocks flag — overlapping blocks handling is now automatic. Remove this flag from your startup scripts.
  • Scrape protocols default change — Prometheus 3.0 defaults to OpenMetrics format for scraping when targets support it. This enables native histograms but may surface parsing differences. Test with --enable-feature=no-default-scrape-port removed if you relied on the old behavior.
  • Agent mode changes — if using Prometheus Agent mode, review the updated configuration options for WAL management.

PromQL Changes

  • Stricter parsing — some previously accepted but technically invalid PromQL expressions now fail. Run your alerting rules through promtool check rules against a Prometheus 3.0 binary before cutover.
  • Native histogram functions — new functions like histogram_fraction() and histogram_quantile() have updated behavior with native histograms. Existing dashboard queries using histogram_quantile() on classic histograms continue to work unchanged.

Storage Compatibility

Prometheus 3.0 can read existing 2.x TSDB data. The upgrade path does not require a data migration. However, Prometheus 2.x cannot read data blocks written by 3.0 (downgrade is not supported without data loss after any writes have occurred). Take a snapshot before upgrading if you need rollback capability:

# Take a TSDB snapshot before upgrading
curl -X POST http://prometheus:9090/api/v1/admin/tsdb/snapshot

# Verify the snapshot exists
ls /prometheus/snapshots/

Pre-Upgrade Validation Steps

# 1. Validate configuration against Prometheus 3.0
docker run --rm -v $(pwd)/prometheus.yml:/etc/prometheus/prometheus.yml \
  prom/prometheus:v3.0.0 \
  promtool check config /etc/prometheus/prometheus.yml

# 2. Validate alerting rules
docker run --rm -v $(pwd)/rules:/etc/prometheus/rules \
  prom/prometheus:v3.0.0 \
  promtool check rules /etc/prometheus/rules/*.yml

# 3. Run in parallel (shadow mode) before full cutover
# Deploy Prometheus 3.0 alongside 2.x, scraping the same targets
# Compare query results between versions using promtool query range

Practical Kubernetes Deployment Example

Here is a production-ready Kubernetes deployment of Prometheus 3.0 with OTLP ingestion enabled, suitable as a starting point for platform engineering teams.

Prometheus 3.0 ConfigMap

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: monitoring
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
      evaluation_interval: 15s
      external_labels:
        cluster: production
        region: eu-west-1

    otlp:
      promote_resource_attributes:
        - service.name
        - service.namespace
        - deployment.environment
        - k8s.namespace.name
        - k8s.pod.name

    rule_files:
      - /etc/prometheus/rules/*.yml

    scrape_configs:
      - job_name: kubernetes-pods
        kubernetes_sd_configs:
          - role: pod
        relabel_configs:
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
            action: keep
            regex: "true"
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)

    remote_write:
      - url: http://thanos-receive.monitoring.svc.cluster.local:10908/api/v1/receive
        send_native_histograms: true
        metadata_config:
          send: true

Prometheus 3.0 Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      serviceAccountName: prometheus
      containers:
        - name: prometheus
          image: prom/prometheus:v3.0.0
          args:
            - --config.file=/etc/prometheus/prometheus.yml
            - --storage.tsdb.path=/prometheus/data
            - --storage.tsdb.retention.time=15d
            - --web.enable-lifecycle
            - --web.enable-admin-api
            - --enable-feature=otlp-write-receiver
            - --enable-feature=utf8-names
            - --enable-feature=native-histograms
          ports:
            - name: http
              containerPort: 9090
              protocol: TCP
          volumeMounts:
            - name: config
              mountPath: /etc/prometheus
            - name: data
              mountPath: /prometheus/data
          resources:
            requests:
              cpu: 500m
              memory: 2Gi
            limits:
              cpu: 2000m
              memory: 8Gi
          livenessProbe:
            httpGet:
              path: /-/healthy
              port: http
            initialDelaySeconds: 30
            periodSeconds: 15
          readinessProbe:
            httpGet:
              path: /-/ready
              port: http
            initialDelaySeconds: 5
            periodSeconds: 5
      volumes:
        - name: config
          configMap:
            name: prometheus-config
        - name: data
          persistentVolumeClaim:
            claimName: prometheus-data
---
apiVersion: v1
kind: Service
metadata:
  name: prometheus
  namespace: monitoring
spec:
  selector:
    app: prometheus
  ports:
    - name: http
      port: 9090
      targetPort: http
  type: ClusterIP

Configuring Applications to Push OTLP

With this deployment, any application in the cluster can push OTLP metrics by setting the following environment variables (works with any OTel SDK supporting OTLP HTTP):

env:
  - name: OTEL_SERVICE_NAME
    valueFrom:
      fieldRef:
        fieldPath: metadata.labels['app']
  - name: OTEL_SERVICE_NAMESPACE
    valueFrom:
      fieldRef:
        fieldPath: metadata.namespace
  - name: OTEL_METRICS_EXPORTER
    value: "otlp"
  - name: OTEL_EXPORTER_OTLP_METRICS_ENDPOINT
    value: "http://prometheus.monitoring.svc.cluster.local:9090/api/v1/otlp/v1/metrics"
  - name: OTEL_EXPORTER_OTLP_METRICS_PROTOCOL
    value: "http/protobuf"
  - name: OTEL_METRIC_EXPORT_INTERVAL
    value: "30000"
  - name: OTEL_RESOURCE_ATTRIBUTES
    value: "deployment.environment=production,k8s.namespace.name=$(NAMESPACE)"

This approach works particularly well in environments using the OTel Operator for Kubernetes, where the Instrumentation CRD can inject these environment variables automatically into pods based on namespace or pod label selectors — zero-touch instrumentation with native Prometheus storage.

Frequently Asked Questions

Can I use Prometheus 3.0 OTLP ingestion for traces and logs?

No. The Prometheus 3.0 OTLP receiver handles only metrics. Prometheus is a metrics store — it has no data model for traces or logs. For traces, you need a backend like Jaeger or Grafana Tempo. For logs, you need Loki, Elasticsearch, or a similar system. The OTel Collector is the appropriate routing layer when you need to send all three signals to their respective backends from a single application-side push endpoint.

Does the kube-prometheus-stack Helm chart support Prometheus 3.0?

Yes, with caveats. The kube-prometheus-stack chart updated its Prometheus image to 3.0 starting with chart version 66.0.0. However, some bundled recording rules and alerting rules may need adjustment for the PromQL changes and default behavioral differences. The Prometheus Operator itself (version 0.78+) has been updated to support the new configuration options including the otlp configuration block. If you are managing Prometheus via the Operator, you will configure OTLP settings through the Prometheus CRD’s spec.additionalArgs and a custom PrometheusConfiguration resource.

What happens to existing Prometheus 2.x metric names when I enable UTF-8 support?

Existing metrics with underscore-based names continue to work exactly as before. Enabling UTF-8 support is purely additive — it allows the storage and querying of metric names containing dots and other UTF-8 characters, but it does not rename or modify existing metrics. Your existing dashboards, alerting rules, and recording rules continue to function without modification. Only metrics ingested via OTLP (or exposed by exporters using OTel naming conventions) will use dot-separated names.

How does native OTLP ingestion affect Prometheus’s pull model?

It coexists with it. Prometheus 3.0 continues to scrape targets via the pull model on the same 9090 port. The OTLP endpoint is an additional ingestion path, not a replacement for scraping. You can have a Prometheus instance simultaneously scraping Kubernetes pods via service discovery and receiving OTLP push metrics from applications — both are stored in the same TSDB and queryable via the same PromQL interface. This hybrid approach is common during migrations, where legacy components are scraped and new OTel-instrumented services push via OTLP.

Is the Prometheus 3.0 OTLP receiver suitable for high-volume production workloads?

For moderate volumes, yes. The OTLP receiver is synchronous — the HTTP request completes only after the samples are written to the WAL. Under very high ingestion rates (hundreds of thousands of samples per second), this can create back-pressure that affects application latency. The OTel Collector handles this better through internal buffering, retry queues, and batch processing. For high-volume scenarios, the recommended pattern is: applications push to OTel Collector (which acknowledges immediately and buffers), Collector pushes to Prometheus via OTLP or remote_write in optimized batches. For the majority of Kubernetes workloads — dozens to hundreds of services with typical metric cardinality — the native OTLP receiver performs well without an intermediary.

Helm Values JSON Schema: Validate Your values.yaml Before It Breaks Production

Helm Values JSON Schema: Validate Your values.yaml Before It Breaks Production

Helm is the de facto package manager for Kubernetes, and values.yaml is its primary interface for configuration. Yet for years, that interface has been completely unvalidated by default — a free-form YAML file where any key can be anything, where typos silently pass through, and where misconfigured deployments only reveal themselves when pods fail to start in production. The values.schema.json file changes that equation entirely. This article explains why schema validation matters, how to implement it properly, and how to integrate it into a modern CI/CD pipeline.

The Problem: Silent Failures in Production

Consider a platform team managing dozens of Helm releases across multiple clusters. A developer submits a values override file with replicaCount: "3" instead of replicaCount: 3 — a string where an integer is expected. Or they set image.pullPolicy: Allways with a typo. Or they omit a required secret reference that the application needs to boot. In all three cases, Helm without schema validation will happily render the templates, produce Kubernetes manifests, and apply them to the cluster. The failure surfaces later — sometimes much later — as a CrashLoopBackOff, an ImagePullBackOff, or a subtle runtime error that takes hours to debug.

This is not a hypothetical scenario. It is the daily reality for teams operating at scale without values validation. The root cause is architectural: Helm templates use Go’s text/template engine, which is weakly typed and permissive by design. A template that does {{ .Values.replicaCount }} will render whether the value is an integer, a string, or even a boolean. The resulting Kubernetes manifest may be invalid, but that error only surfaces when the Kubernetes API server rejects it — or worse, accepts it but interprets it differently than intended.

The consequences compound at scale. When a chart is used by multiple teams, the lack of a formal contract for acceptable values means every consumer has to read through template files and comments to understand what inputs are valid. There is no machine-readable specification. There is no IDE support. There is no guardrail. The only documentation is whatever the chart author happened to write in comments inside values.yaml — and comments do not stop a CI pipeline from shipping a broken deployment.

What Is values.schema.json

Since Helm 3.0.0, released in November 2019, Helm supports an optional values.schema.json file at the root of a chart directory — the same level as Chart.yaml and values.yaml. This file is a JSON Schema draft-07 document that formally describes the structure, types, constraints, and required fields for the chart’s values.

When this file is present, Helm automatically validates the merged values (defaults from values.yaml merged with any user-supplied overrides) against the schema at multiple points: during helm install, helm upgrade, helm template, and helm lint. If validation fails, Helm refuses to proceed and prints a human-readable error message identifying exactly which value failed and why. This transforms a class of runtime failures into build-time failures — the correct direction for any production system.

The choice of JSON Schema draft-07 specifically is worth noting. Draft-07 is widely supported by tooling, including the Red Hat YAML extension for VS Code, JetBrains IDEs, and most JSON Schema validators. It introduced the if/then/else conditional keywords that are particularly useful for Helm charts. More recent drafts (2019-09, 2020-12) offer additional features but have less universal tooling support, making draft-07 the pragmatic choice for chart authors today.

Chart Directory Structure

my-app/
├── Chart.yaml
├── values.yaml
├── values.schema.json      ← lives here
├── charts/
└── templates/
    ├── deployment.yaml
    ├── service.yaml
    ├── ingress.yaml
    └── _helpers.tpl

The schema file is included when a chart is packaged with helm package and distributed through chart repositories. Consumers of the chart get schema validation automatically without any additional configuration — the guardrails ship with the chart itself.

How Helm Uses the Schema

Helm’s validation behavior is straightforward but has some nuances worth understanding. When Helm processes a release, it first merges all value sources in order of increasing precedence: chart defaults (values.yaml), parent chart values, -f value files, and finally --set flags. The merged result is then validated against the schema as a single operation.

This means the schema validates the effective values, not each source in isolation. A required field that has a default in values.yaml will pass validation even when not specified by the user, because the merged result includes the default. This is the correct behavior — it validates what will actually be used during rendering.

The validation happens before template rendering. If schema validation fails, Helm exits with a non-zero status code and prints all validation errors. The error output is structured and actionable:

$ helm install my-release ./my-app --set replicaCount=abc

Error: values don't meet the specifications of the schema(s) in the following chart(s):
my-app:
- replicaCount: Invalid type. Expected: integer, given: string

For helm lint, which is typically used in CI pipelines without installing to a cluster, schema validation also runs. This makes helm lint a powerful pre-deployment gate when schema files are present.

IDE Benefits: Autocompletion and Inline Validation

Beyond Helm’s own validation, values.schema.json unlocks IDE support that significantly improves the developer experience when working with values files. The Red Hat YAML extension for VS Code can reference a JSON Schema file to provide autocompletion, type checking, and inline error highlighting for YAML files.

To enable this, add a yaml.schemas configuration to your VS Code workspace settings or the user settings file:

// .vscode/settings.json
{
  "yaml.schemas": {
    "./my-app/values.schema.json": "./my-app/values.yaml"
  }
}

With this configuration, editing values.yaml in VS Code will show autocompletion for defined keys, inline errors for type mismatches, and hover documentation pulled from the description fields in your schema. For platform teams maintaining internal Helm charts, this transforms the chart into a self-documenting, IDE-aware configuration interface — without any additional tooling investment.

JetBrains IDEs (IntelliJ IDEA, GoLand, etc.) support JSON Schema associations through the Languages & Frameworks > Schemas and DTDs > JSON Schema Mappings settings panel, providing equivalent functionality for teams using those tools.

Building the Schema: A Practical Guide

Let’s build a complete, realistic example. Start with a typical values.yaml for a web application chart:

# values.yaml
replicaCount: 2

image:
  repository: myorg/my-app
  tag: "1.0.0"
  pullPolicy: IfNotPresent

service:
  type: ClusterIP
  port: 80

ingress:
  enabled: false
  hostname: ""
  tls: false

resources:
  requests:
    cpu: "100m"
    memory: "128Mi"
  limits:
    cpu: "500m"
    memory: "512Mi"

autoscaling:
  enabled: false
  minReplicas: 1
  maxReplicas: 10
  targetCPUUtilizationPercentage: 80

config:
  logLevel: info
  databaseUrl: ""

nodeSelector: {}
tolerations: []
affinity: {}

Now the full values.schema.json that validates this structure:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "my-app Helm Chart Values",
  "description": "Configuration values for the my-app Helm chart",
  "type": "object",
  "additionalProperties": false,
  "required": ["image", "service"],
  "$defs": {
    "resourceQuantity": {
      "type": "string",
      "pattern": "^[0-9]+(\\.[0-9]+)?(m|Ki|Mi|Gi|Ti|Pi|Ei|k|M|G|T|P|E)?$",
      "description": "A Kubernetes resource quantity (e.g. 100m, 128Mi, 1Gi)"
    }
  },
  "properties": {
    "replicaCount": {
      "type": "integer",
      "minimum": 0,
      "maximum": 50,
      "default": 2,
      "description": "Number of pod replicas. Set to 0 to scale down."
    },
    "image": {
      "type": "object",
      "additionalProperties": false,
      "required": ["repository", "tag"],
      "description": "Container image configuration",
      "properties": {
        "repository": {
          "type": "string",
          "minLength": 1,
          "description": "Container image repository"
        },
        "tag": {
          "type": "string",
          "pattern": "^[a-zA-Z0-9._-]+$",
          "minLength": 1,
          "description": "Image tag. Avoid using 'latest' in production."
        },
        "pullPolicy": {
          "type": "string",
          "enum": ["Always", "IfNotPresent", "Never"],
          "default": "IfNotPresent",
          "description": "Kubernetes imagePullPolicy"
        }
      }
    },
    "service": {
      "type": "object",
      "additionalProperties": false,
      "required": ["type", "port"],
      "properties": {
        "type": {
          "type": "string",
          "enum": ["ClusterIP", "NodePort", "LoadBalancer", "ExternalName"],
          "description": "Kubernetes Service type"
        },
        "port": {
          "type": "integer",
          "minimum": 1,
          "maximum": 65535,
          "description": "Service port"
        }
      }
    },
    "ingress": {
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "enabled": {
          "type": "boolean",
          "default": false
        },
        "hostname": {
          "type": "string",
          "description": "Ingress hostname. Required when ingress.enabled is true."
        },
        "tls": {
          "type": "boolean",
          "default": false,
          "description": "Enable TLS for the ingress"
        }
      },
      "if": {
        "properties": {
          "enabled": { "const": true }
        },
        "required": ["enabled"]
      },
      "then": {
        "required": ["hostname"],
        "properties": {
          "hostname": {
            "minLength": 1,
            "pattern": "^[a-zA-Z0-9]([a-zA-Z0-9\\-\\.]+)?[a-zA-Z0-9]$"
          }
        }
      }
    },
    "resources": {
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "requests": {
          "type": "object",
          "additionalProperties": false,
          "properties": {
            "cpu": { "$ref": "#/$defs/resourceQuantity" },
            "memory": { "$ref": "#/$defs/resourceQuantity" }
          }
        },
        "limits": {
          "type": "object",
          "additionalProperties": false,
          "properties": {
            "cpu": { "$ref": "#/$defs/resourceQuantity" },
            "memory": { "$ref": "#/$defs/resourceQuantity" }
          }
        }
      }
    },
    "autoscaling": {
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "enabled": {
          "type": "boolean",
          "default": false
        },
        "minReplicas": {
          "type": "integer",
          "minimum": 1
        },
        "maxReplicas": {
          "type": "integer",
          "minimum": 1,
          "maximum": 100
        },
        "targetCPUUtilizationPercentage": {
          "type": "integer",
          "minimum": 1,
          "maximum": 100
        }
      },
      "if": {
        "properties": {
          "enabled": { "const": true }
        },
        "required": ["enabled"]
      },
      "then": {
        "required": ["minReplicas", "maxReplicas"]
      }
    },
    "config": {
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "logLevel": {
          "type": "string",
          "enum": ["debug", "info", "warn", "error"],
          "default": "info",
          "description": "Application log level"
        },
        "databaseUrl": {
          "type": "string",
          "description": "Database connection URL"
        }
      }
    },
    "nodeSelector": {
      "type": "object",
      "description": "Node selector labels for pod scheduling"
    },
    "tolerations": {
      "type": "array",
      "description": "Pod tolerations"
    },
    "affinity": {
      "type": "object",
      "description": "Pod affinity rules"
    }
  }
}

Key Schema Patterns Explained

additionalProperties: false

This is arguably the most important pattern in a Helm schema. Without it, unknown keys pass validation silently — which defeats much of the purpose. With "additionalProperties": false, any key not listed in properties causes a validation error. This catches typos like repicaCount instead of replicaCount, which would otherwise silently use the default value and leave the developer wondering why their override had no effect.

Apply it at every nested object level, not just the root. A typo inside image: or resources: is just as dangerous as one at the top level.

$defs for Reusable Definitions

The $defs keyword (called definitions in earlier draft versions, though draft-07 supports both) provides a namespace for reusable schema fragments. In the example above, resourceQuantity is defined once and referenced via $ref in both requests and limits. This avoids duplication and ensures consistent validation logic across related fields.

For larger charts, $defs becomes essential. Common patterns include reusable schemas for image configurations, resource requirements, probe configurations, and environment variable maps.

Conditional Validation with if/then/else

The if/then/else construct in JSON Schema draft-07 is particularly powerful for Helm charts, where many values are conditional on a feature toggle. The ingress example above demonstrates this: when ingress.enabled is true, the hostname field becomes required and must match a valid hostname pattern. When ingress is disabled, the hostname can be empty or omitted entirely.

This pattern can be extended for more complex scenarios. For example, enforcing that when autoscaling.enabled is true, the standalone replicaCount should not be set (since the HPA controls replica count):

{
  "if": {
    "properties": {
      "autoscaling": {
        "properties": {
          "enabled": { "const": true }
        },
        "required": ["enabled"]
      }
    }
  },
  "then": {
    "properties": {
      "replicaCount": {
        "description": "replicaCount is ignored when autoscaling is enabled"
      }
    }
  }
}

Pattern Validation for Image Tags

The image tag field is a common source of production issues. Teams accidentally deploy with latest, which is non-deterministic and makes rollbacks unreliable. A pattern constraint can enforce semantic versioning or at least ban the latest tag in production charts:

"tag": {
  "type": "string",
  "not": {
    "enum": ["latest", ""]
  },
  "pattern": "^[0-9]+\\.[0-9]+\\.[0-9]+",
  "description": "Semantic version tag required. 'latest' is not permitted."
}

This enforces that image tags start with a semantic version number, immediately rejecting latest, empty strings, or arbitrary branch names that would produce non-reproducible deployments.

Enum for Controlled Vocabularies

Fields with a fixed set of valid values — Kubernetes service types, image pull policies, log levels — should use enum. This is more precise than a pattern and produces clearer error messages. It also enables IDE autocompletion to show exactly the valid options as a pick-list, rather than requiring the developer to remember or look up acceptable values.

CI/CD Integration

GitHub Actions

The most direct integration point is helm lint, which runs schema validation as part of its checks. A minimal GitHub Actions workflow that validates a chart on every pull request looks like this:

# .github/workflows/helm-lint.yaml
name: Helm Lint

on:
  pull_request:
    paths:
      - 'charts/**'

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Helm
        uses: azure/setup-helm@v4
        with:
          version: '3.14.0'

      - name: Lint chart with default values
        run: helm lint charts/my-app

      - name: Lint chart with staging values
        run: helm lint charts/my-app -f charts/my-app/ci/staging-values.yaml

      - name: Lint chart with production values
        run: helm lint charts/my-app -f charts/my-app/ci/production-values.yaml

      - name: Validate template rendering
        run: |
          helm template my-app charts/my-app \
            -f charts/my-app/ci/production-values.yaml \
            --debug > /dev/null

The ci/ directory convention (values files specifically for CI testing) is a pattern from the chart-testing tool and works well for validating multiple realistic value combinations, not just the defaults.

For teams using the ct (chart-testing) CLI tool from the Helm project, schema validation is automatically included in the ct lint command, which also handles chart versioning checks and YAML linting:

      - name: Chart Testing lint
        uses: helm/chart-testing-action@v2.6.1

      - name: Run chart-testing lint
        run: ct lint --target-branch ${{ github.event.repository.default_branch }}

Pre-commit Hooks

For local development, pre-commit hooks catch issues before code is even pushed. The pre-commit framework makes this straightforward:

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/gruntwork-io/pre-commit
    rev: v0.1.23
    hooks:
      - id: helmlint

  - repo: local
    hooks:
      - id: helm-schema-validate
        name: Helm Schema Validation
        language: script
        entry: scripts/validate-helm-schemas.sh
        files: ^charts/.*values.*\.yaml$
#!/usr/bin/env bash
# scripts/validate-helm-schemas.sh
set -euo pipefail

for chart_dir in charts/*/; do
  if [[ -f "${chart_dir}/values.schema.json" ]]; then
    echo "Linting ${chart_dir}..."
    helm lint "${chart_dir}" --strict
  fi
done

ArgoCD and Flux Integration

Both ArgoCD and Flux (Helm Controller) invoke helm template internally when reconciling Helm releases. Since helm template runs schema validation when a schema file is present, any invalid values in an HelmRelease or ArgoCD Application manifest will cause the reconciliation to fail with a clear error message — visible in the controller logs and surface as a degraded resource status. No additional configuration is required; schema validation is automatic.

Generating Schemas from Existing Charts

For charts that already have a well-structured values.yaml, writing a schema from scratch is time-consuming but not starting from zero. Several tools can generate a draft schema that you then refine:

  • helm-values-schema-json — a Helm plugin (helm plugin install https://github.com/losisin/helm-values-schema-json) that introspects values.yaml and generates a draft schema with inferred types. Run with helm schema-gen values.yaml.
  • json-schema-generator online tools — paste your values as JSON (convert YAML to JSON first) and get a draft schema back.
  • Manually from scratch — for new charts, writing the schema alongside the values file from the beginning is the most accurate approach and requires no extra tooling.

Generated schemas are always starting points. They infer types from existing values but cannot know about intended constraints, enums, patterns, required fields in conditional cases, or additionalProperties: false at nested levels. Manual review and refinement is always necessary.

Common Mistakes and How to Avoid Them

MistakeSymptomFix
Missing additionalProperties: falseTypos in key names pass validation silentlyAdd it at every object level, including nested objects
Schema only at root levelNested typos go undetectedApply additionalProperties: false recursively
Not including defaults in schemaIDE shows fields as required when they are optionalAdd default to all optional fields
Overly strict patterns blocking valid valuesLegitimate deployments fail schema validationTest patterns against your real value space before shipping
Using definitions instead of $defsWorks in most tools but is draft-2019-09+ terminologyUse $defs for draft-07 compliance; both work in practice
Schema not committed to the chart repoConsumers get no validation when pulling from repositoryAlways commit values.schema.json alongside the chart
Validating subchart values through parent schemaSchema errors for subchart values the parent doesn’t ownDo not attempt to validate subchart values in parent schema; each chart owns its own schema

The Null Value Problem

A subtle but common issue: in YAML, an unset key with no value (key:) resolves to null, not an empty string or zero. If your schema defines a field as "type": "string", a null value will fail validation. To handle optional fields that users might leave blank, use a type union:

"databaseUrl": {
  "type": ["string", "null"],
  "description": "Database connection URL. Leave null to use the default."
}

Alternatively, ensure your values.yaml defaults use empty strings ("") rather than bare keys, and document that convention for chart consumers.

Schema Drift

As charts evolve, new values get added to values.yaml without corresponding updates to values.schema.json. Over time the schema becomes stale and provides partial coverage. The fix is procedural: treat schema updates as part of the definition of done for any PR that modifies values. Code review should include checking that new or modified values have corresponding schema entries.

Frequently Asked Questions

Does values.schema.json validate subchart values?

No. Each chart in a dependency relationship validates only its own values against its own schema. If chart A depends on chart B, and chart B has a schema, chart B’s schema validates the values under the b: key in chart A’s values.yaml — but only when processed in the context of chart B itself. Chart A’s schema should not attempt to describe chart B’s values structure. This is by design: it maintains loose coupling between charts and allows subcharts to evolve their schemas independently.

Can I use JSON Schema draft-2020-12 instead of draft-07?

Technically, Helm does not strictly enforce which draft version you use — it uses the Go library github.com/xeipuuv/gojsonschema, which supports draft-04 through draft-07. Using newer draft keywords that are not supported by this library may cause them to be silently ignored rather than throwing an error. For IDE support, draft-07 has the broadest compatibility. If you need features from newer drafts (like unevaluatedProperties from 2020-12), test carefully to confirm they are enforced by Helm’s validator and not silently skipped.

How do I handle values that differ between environments without schema conflicts?

The schema should describe all valid values across all environments. Use enum to enumerate all valid values for a field, and use if/then/else for constraints that only apply in certain configurations. The schema is a contract for what the chart accepts, not a policy for what a specific environment should use. Environment-specific policies (such as “production must use a minimum of 3 replicas”) are better enforced at a higher level — through admission controllers like OPA Gatekeeper or Kyverno — rather than in the chart schema itself.

Does schema validation run when using helm template for dry runs?

Yes. helm template runs schema validation before rendering templates. This makes it useful as a validation step in CI pipelines even without a live cluster: helm template release-name ./chart -f values-override.yaml will fail with schema errors if the values are invalid, and will output the rendered manifests if they are valid. Piping the output to kubectl apply --dry-run=client -f - adds an additional layer of Kubernetes API validation for a thorough offline check.

Should I add values.schema.json to charts I don’t maintain (upstream charts)?

For upstream charts you consume but do not maintain (such as Bitnami charts, ingress-nginx, cert-manager), the recommended approach is to maintain a separate JSON Schema file in your own GitOps repository that validates your specific values overlay files. Tools like jsonschema (Python) or ajv (Node.js) can validate a YAML/JSON values file against a schema in CI without Helm being involved. This gives you schema validation for your environment-specific overrides without needing to modify upstream chart sources.

After NGINX Ingress Controller: Alternatives and Migration Guide

After NGINX Ingress Controller: Alternatives and Migration Guide

If you manage Kubernetes clusters in production, the last 18 months have been uncomfortable. Two of the most widely deployed NGINX-based Ingress Controllers have faced critical security vulnerabilities, deprecation announcements, and shifting maintenance responsibilities — all while the Kubernetes project accelerates its push toward a new traffic management standard. This is not a drill. Teams running ingress-nginx or the F5/NGINX Ingress Controller need a clear picture of what changed, what it means for their clusters, and what their realistic options are going forward.

First, Clear the Confusion: There Are Two NGINX Ingress Controllers

One of the most persistent sources of confusion in the Kubernetes networking space is that there are two completely different projects both called “NGINX Ingress Controller,” maintained by different organizations, with different architectures and different licensing.

ingress-nginx (kubernetes/ingress-nginx)

This is the community-maintained controller under the Kubernetes project umbrella, hosted at github.com/kubernetes/ingress-nginx. It uses the open-source NGINX as its data plane, configured via Lua scripting and dynamically generated nginx.conf files. This is the controller most teams end up with when they follow the official Kubernetes documentation or install from the Helm chart referenced in the ingress guide. It is free, open-source, and until recently was considered the default choice.

NGINX Ingress Controller (nginxinc/kubernetes-ingress)

This is the commercial and open-source controller maintained by F5/NGINX, hosted at github.com/nginxinc/kubernetes-ingress. It also supports NGINX Plus (the commercial version with enhanced features like active health checks, JWT authentication, and advanced load balancing). The architecture is different — it uses native NGINX APIs rather than the Lua-heavy approach — and it targets enterprise customers looking for support contracts and advanced capabilities.

These two controllers are not interchangeable. Configuration annotations differ, Helm chart values differ, and behavior under edge cases differs substantially. Understanding which one your cluster runs is the necessary starting point for any decision about migration.

# Check which NGINX IC you are actually running
kubectl get pods -n ingress-nginx -o jsonpath='{.items[*].spec.containers[*].image}'

# Community controller image looks like:
# registry.k8s.io/ingress-nginx/controller:v1.x.x

# F5/NGINX controller image looks like:
# nginx/nginx-ingress:x.x.x  or  private-registry.nginx.com/nginx-ic/nginx-plus-ingress:x.x.x

What Actually Happened: A Timeline of Disruption

The ingress-nginx CVEs (2024)

In March 2024, security researchers disclosed a set of critical vulnerabilities in ingress-nginx under the collective name IngressNightmare (CVE-2025-1097, CVE-2025-1098, CVE-2025-1974, CVE-2025-24514). The most severe of these, rated CVSS 9.8, allowed unauthenticated remote code execution against the ingress-nginx admission webhook. An attacker with network access to the admission controller could craft a malicious Ingress object to inject arbitrary NGINX configuration, ultimately achieving code execution in the controller pod — which in many clusters runs with elevated permissions and access to service account tokens across namespaces.

The vulnerabilities affected the vast majority of ingress-nginx deployments in the wild. Wiz Research, which discovered and disclosed the issues, estimated that approximately 43% of cloud environments were exposed. Patches were released in versions 1.11.5 and 1.12.1, but the incident forced uncomfortable questions about the controller’s security posture and the architecture decisions (particularly the admission webhook design) that made it possible.

Maintenance Concerns in ingress-nginx

Beyond the CVEs, the ingress-nginx project has faced ongoing concerns about maintainer bandwidth. The project is maintained by a small group of volunteers and relies heavily on community contributions. Issue response times slowed, pull requests aged, and the pace of feature development declined relative to alternatives. For a component as critical as the cluster ingress layer, this created legitimate concern about long-term sustainability without corporate backing or broader contributor growth.

F5/NGINX Deprecation Announcement

On the commercial side, F5/NGINX announced in early 2025 that the nginxinc/kubernetes-ingress controller — particularly its open-source tier — would undergo significant changes. F5 signaled a strategic shift toward NGINX Gateway Fabric, their implementation of the Kubernetes Gateway API specification. The message was clear: investment in the Ingress-based controller would be reduced, and customers were encouraged to plan migrations toward Gateway API-native solutions.

For teams running NGINX Plus-based ingress with support contracts, this was a significant business concern. The product they had licensed and standardized on was being steered toward end-of-life on the Ingress API, even if exact timelines remained somewhat ambiguous in the initial announcements.

Real Impact on Production Clusters

The practical consequences depend heavily on which controller you run and how your clusters are configured. Here is an honest assessment:

Immediate Security Risk

If you run ingress-nginx and have not patched to 1.11.5+ or 1.12.1+, your admission webhook is a critical attack surface. Patching is non-negotiable and should have happened already. The admission webhook can be disabled if you are not using it for validation (many teams are not), which significantly reduces the attack surface while you plan a longer-term migration.

# Check your current ingress-nginx version
kubectl get deployment ingress-nginx-controller -n ingress-nginx \
  -o jsonpath='{.spec.template.spec.containers[0].image}'

# Verify admission webhook is configured
kubectl get validatingwebhookconfigurations | grep ingress

# If you need to disable the webhook temporarily (reduces but does not eliminate risk):
kubectl delete validatingwebhookconfiguration ingress-nginx-admission

Operational Uncertainty

Even after patching, the underlying questions remain. Teams are now asking: should we invest in hardening and tuning ingress-nginx knowing it may not be the strategic direction? Should we migrate now when it is our choice, rather than later when it may be forced? For NGINX IC customers, they are evaluating whether their licensing costs justify continued investment in a product being steered toward deprecation.

Configuration Migration Complexity

The real cost of migration is in the annotation-heavy configurations that accumulate over time. Teams that have built complex routing logic using nginx.ingress.kubernetes.io/* annotations — custom headers, rate limiting, auth snippets, rewrite rules, canary traffic splitting — face significant rework when switching controllers. This is the primary reason many teams are reluctant to move despite clear signals that a transition is coming.

The Alternatives: An Honest Evaluation

There is no shortage of Ingress controller options. The question is which alternatives are mature enough for production workloads at scale, and what trade-offs each brings.

Traefik

Traefik Proxy (and its Kubernetes-native version via Traefik Hub) has emerged as the most popular alternative for teams leaving ingress-nginx. It supports the standard Kubernetes Ingress API for drop-in compatibility, its own IngressRoute CRDs for advanced features, and Kubernetes Gateway API. It is written in Go, has strong TLS automation via Let’s Encrypt, and has excellent observability with built-in metrics and a real-time dashboard.

Trade-offs: Traefik’s configuration model is different enough from NGINX that complex routing logic requires rethinking rather than translating. Performance under very high connection counts is generally good but NGINX has a longer track record in extreme-scale deployments. The commercial offering (Traefik Hub) adds API gateway capabilities but introduces vendor dependency.

Envoy Gateway

Envoy Gateway is now a CNCF project and implements the Kubernetes Gateway API natively using Envoy as its data plane. This is arguably the most strategically aligned option for teams that want to bet on the future of Kubernetes networking. Envoy is battle-tested (it powers Istio, Contour, and large-scale service meshes at companies like Lyft and Google), and the Gateway API implementation is comprehensive and actively developed.

Trade-offs: Envoy Gateway is relatively young as a standalone project. Teams unfamiliar with Envoy will face a steeper learning curve for debugging and custom configuration. The operational model differs significantly from NGINX-based controllers. However, for greenfield deployments or teams willing to invest in the transition, this is a strong forward-looking choice.

Cilium Gateway API

If your cluster already runs Cilium as the CNI, enabling Gateway API support is a natural evolution. Cilium’s Gateway API implementation leverages eBPF for high-performance packet processing, avoiding the overhead of userspace proxy hops entirely. It is deeply integrated with Cilium’s network policy model and observability stack (Hubble).

Trade-offs: This option is only relevant if you are already committed to Cilium as your CNI, or are willing to make that switch simultaneously. Migrating both the CNI and the ingress layer at the same time is a significant operational risk. For Cilium shops, however, this consolidates complexity and provides excellent performance and observability.

HAProxy Ingress

HAProxy Ingress Controller is maintained by the HAProxy Technologies team and has a strong reputation for raw performance and precise traffic control. It supports both Ingress and Gateway API and has a long track record in high-throughput production environments. For teams with existing HAProxy expertise, it provides a familiar mental model for load balancing configuration.

Trade-offs: Smaller community than Traefik or NGINX. Less ecosystem tooling and fewer tutorials. Best suited for teams that specifically want HAProxy’s capabilities (fine-grained connection management, advanced health checking, TCP/HTTP mode flexibility) rather than as a default choice.

Kong Ingress Controller

Kong bridges the gap between an Ingress controller and a full API gateway. It supports Ingress and Gateway API resources alongside its own Kong-native plugin system for authentication, rate limiting, transformation, and observability. For teams that need API gateway capabilities rather than pure L7 routing, Kong provides a unified platform.

Trade-offs: Kong adds operational complexity. Running Kong requires either a PostgreSQL database (DB-mode) or careful management of declarative configuration (DB-less mode). The plugin ecosystem is powerful but introduces additional configuration surface. For teams that just need ingress routing, Kong may be more than necessary. For teams building API platforms, it is worth the overhead.

Istio Gateway

Istio’s ingress gateway (now aligned with Gateway API via its Kubernetes Gateway API integration) provides entry-point traffic management as part of a full service mesh. If your organization is planning or running Istio for east-west traffic, using Istio’s gateway for north-south traffic creates a unified data plane (Envoy) and consistent observability across all service communication.

Trade-offs: Istio is a serious operational commitment. The control plane overhead, the learning curve, and the impact on pod scheduling and sidecar management are significant. Choosing Istio purely for ingress replacement is like buying a race car because you needed a vehicle with good brakes. Consider this path only if service mesh capabilities are on your roadmap.

NGINX Gateway Fabric (F5’s Gateway API implementation)

F5/NGINX is building NGINX Gateway Fabric as their strategic forward path — an NGINX-based implementation of the Kubernetes Gateway API. For teams heavily invested in NGINX and wanting to stay in that ecosystem while moving to Gateway API, this provides a migration path within familiar territory. It is still maturing but represents where F5 is putting its development resources.

Comparison Matrix

ControllerIngress APIGateway APIMaturityBest ForComplexity
ingress-nginxYesPartialHighExisting deployments, familiar configLow
TraefikYesYesHighGeneral purpose, rapid migrationLow-Medium
Envoy GatewayNoYes (native)MediumGreenfield, future-alignedMedium
Cilium GatewayYesYesMediumCilium CNI clustersLow (if Cilium)
HAProxy IngressYesYesHighHigh-throughput, HAProxy expertiseMedium
KongYesYesHighAPI gateway requirementsHigh
Istio GatewayVia Gateway APIYesHighService mesh adoptersVery High
NGINX Gateway FabricNoYes (native)Low-MediumNGINX shops moving to Gateway APIMedium

Gateway API: The Strategic Direction You Cannot Ignore

The Kubernetes Gateway API is not simply “Ingress v2.” It is a fundamentally richer traffic management model designed to address the limitations that drove teams to annotation-based workarounds for the past several years. Understanding it is essential regardless of which controller you choose, because the ecosystem is clearly converging on it.

The core resource hierarchy consists of GatewayClass (defines a type of gateway, created by infrastructure providers), Gateway (a specific instance of a listener configuration, typically managed by platform teams), and HTTPRoute, TCPRoute, GRPCRoute, and other route resources (managed by application teams). This separation of concerns maps cleanly onto organizational roles — infrastructure teams control the gateway, application teams control their routing rules.

# Example Gateway API resources replacing an ingress-nginx Ingress
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: production-gateway
  namespace: infra
spec:
  gatewayClassName: nginx  # or envoy, traefik, cilium, etc.
  listeners:
  - name: https
    protocol: HTTPS
    port: 443
    tls:
      mode: Terminate
      certificateRefs:
      - name: wildcard-tls
    allowedRoutes:
      namespaces:
        from: All
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: my-app
  namespace: my-app-namespace
spec:
  parentRefs:
  - name: production-gateway
    namespace: infra
  hostnames:
  - "app.example.com"
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /api
    backendRefs:
    - name: my-api-service
      port: 8080
  - matches:
    - path:
        type: PathPrefix
        value: /
    backendRefs:
    - name: my-frontend-service
      port: 3000

Gateway API reached v1.0 (GA for HTTPRoute and Gateway) in October 2023, and v1.1 followed in 2024 with GRPCRoute graduation and expanded features. The project has broad support across controllers (Traefik, Envoy Gateway, Cilium, NGINX Gateway Fabric, Kong, Istio, and others all implement it). The Ingress API is not being removed from Kubernetes, but new feature development is effectively frozen — Gateway API is where capabilities like traffic weighting, header manipulation, request mirroring, and backend protocol configuration are being built.

Decision Framework: Stay, Migrate, or Evaluate?

There is no universal right answer. The following framework helps teams make a context-appropriate decision rather than following hype or panic.

Stay on ingress-nginx if:

  • You have patched to 1.11.5+ or 1.12.1+ and have disabled or hardened the admission webhook
  • Your cluster is stable, heavily annotation-dependent, and migration cost outweighs risk
  • You have internal NGINX expertise and can take ownership of monitoring the project’s maintenance health
  • Your organization has a short-term horizon (decommissioning or major platform change within 12-18 months)

Migrate now if:

  • You are running the F5/NGINX IC with a support contract that is being deprecated
  • Your cluster has moderate annotation complexity and you have engineering cycles available
  • You are planning a major Kubernetes version upgrade or cluster rebuild — do it at the same time
  • Your security team has flagged the CVE history as unacceptable for your risk profile
  • You are building a new cluster or platform team and want to standardize on Gateway API from the start

Evaluate before committing if:

  • Your workloads have complex traffic requirements (WebSockets, gRPC, canary deployments, header-based routing) that differ significantly across controllers
  • You are considering Gateway API but the specific controllers in your environment have not graduated their Gateway API implementations yet
  • You have multi-cluster or multi-tenant requirements that change the analysis
  • You need to assess total cost including commercial support, tooling changes, and team retraining

Migration Checklist

For teams that have decided to migrate, the following sequence reduces risk and ensures nothing critical is missed:

Phase 1: Inventory and Assessment

  • Enumerate all Ingress resources across all namespaces and document their annotations
  • Identify annotations with no direct equivalent in your target controller
  • Map TLS certificate sources (cert-manager, Secrets, external providers) and confirm compatibility
  • Document any custom NGINX configuration snippets (nginx.ingress.kubernetes.io/configuration-snippet, server-snippet) — these are high-risk items that require manual translation
  • Inventory any rate limiting, authentication, or WAF configurations layered on the controller
# Enumerate all ingress resources and their annotations across the cluster
kubectl get ingress -A -o json | jq -r '
  .items[] |
  {
    namespace: .metadata.namespace,
    name: .metadata.name,
    annotations: (.metadata.annotations // {} | keys)
  }
'

Phase 2: Target Controller Validation

  • Deploy target controller to a non-production cluster with identical Ingress/HTTPRoute resources
  • Validate TLS termination, redirect behavior, and timeout configurations
  • Run load tests to confirm performance characteristics match expectations
  • Validate observability — metrics, logs, and traces integrate with your existing stack
  • Test failure scenarios: backend unavailability, certificate expiry, controller pod restart

Phase 3: Staged Production Migration

  • Deploy new controller to production alongside existing controller (different IngressClass)
  • Migrate low-risk, low-traffic Ingress resources first by updating their ingressClassName
  • Use DNS-based canary switching (weighted routing at the DNS level) rather than switching entire IngressClass at once
  • Monitor error rates and latency for 24-48 hours after each batch migration
  • Migrate critical services during low-traffic windows with rollback plan documented
  • Decommission old controller only after all resources are migrated and validated
# Migrate individual Ingress to new controller by changing ingressClassName
kubectl patch ingress my-app -n my-namespace \
  --type='json' \
  -p='[{"op": "replace", "path": "/spec/ingressClassName", "value": "traefik"}]'

# Or if migrating to Gateway API, create equivalent HTTPRoute first,
# test it, then remove the old Ingress resource
kubectl apply -f my-app-httproute.yaml
# Validate, then:
kubectl delete ingress my-app -n my-namespace

Phase 4: Gateway API Adoption (Optional but Recommended)

  • Install Gateway API CRDs if not already present (kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.1.0/standard-install.yaml)
  • Define GatewayClass resources matching your chosen controller
  • Migrate Ingress resources to HTTPRoute progressively, starting with simpler configurations
  • Update CI/CD pipelines and Helm charts to generate HTTPRoute instead of Ingress resources for new services
  • Establish a policy: new services use Gateway API; legacy services migrate on their next significant update

Recommendation

For most platform engineering teams reading this in 2025, the pragmatic recommendation is as follows:

Short term (next 30 days): Patch ingress-nginx to the latest release if you are still on it. Assess and harden or disable the admission webhook. This is not optional.

Medium term (3-6 months): Evaluate Traefik or Envoy Gateway against your specific workload requirements. Traefik is the lower-friction migration for teams coming from ingress-nginx on the Ingress API. Envoy Gateway is the stronger strategic choice if you are willing to commit to Gateway API fully. Either way, run a parallel deployment in a non-production environment and measure the delta in operational overhead.

Long term (6-18 months): Plan migration to Gateway API resources regardless of which data plane you choose. The Ingress API will not disappear overnight, but feature parity with Gateway API capabilities will never arrive. Teams that standardize on Gateway API now build the institutional knowledge that will be valuable as the ecosystem continues to evolve.

If you are running F5/NGINX IC under a support contract: engage your F5 account team now to get a clear timeline on the deprecation path and evaluate NGINX Gateway Fabric as a within-ecosystem migration before looking at alternatives. The question is not whether to migrate but when and to what.

Avoid the temptation to treat this as a purely technical decision. The switch of an ingress controller touches CI/CD pipelines, monitoring dashboards, runbooks, on-call playbooks, and engineering team knowledge. Factor in the total transition cost, not just the YAML changes.

Frequently Asked Questions

Is the Kubernetes Ingress API being deprecated or removed?

No. The networking.k8s.io/v1 Ingress API is not deprecated and there are no current plans to remove it from Kubernetes. It will continue to work. What is happening is that the Kubernetes SIG Network has frozen new feature development on the Ingress API and is directing all new traffic management capabilities to Gateway API. In practical terms, if you need a capability that Ingress does not currently provide, you will not get it through Ingress. You will need Gateway API. Existing Ingress resources will continue to function for the foreseeable future.

Can I run two Ingress controllers simultaneously during migration?

Yes, and this is the recommended approach for production migrations. Kubernetes supports multiple IngressClass resources in a cluster, each backed by a different controller. Ingress resources select their controller via the spec.ingressClassName field (or the legacy kubernetes.io/ingress.class annotation). You can run ingress-nginx and Traefik side-by-side, migrating individual Ingress resources by updating their ingressClassName. Once migration is complete and validated, decommission the old controller. Just ensure both controllers are not both marked as the default IngressClass simultaneously, as this causes conflicts.

What happens to cert-manager if I switch controllers?

cert-manager is independent of your Ingress controller and will continue to work regardless of which controller you use. The HTTP-01 challenge solver in cert-manager creates temporary Ingress resources to complete ACME challenges — these will use whichever IngressClass you configure in your Issuer or ClusterIssuer. If you migrate to Gateway API, cert-manager has added Gateway API support (HTTPRoute-based HTTP-01 challenges) starting from version 1.14. DNS-01 challenges are entirely unaffected by controller choice. Update your Issuer configuration to reference the new IngressClass during migration.

How severe is the performance difference between ingress-nginx and alternatives?

For the vast majority of production workloads, the performance difference between mature controllers (ingress-nginx, Traefik, HAProxy, Envoy) is not the deciding factor. All of them can handle tens of thousands of requests per second on reasonable hardware, and the bottleneck is typically the backend services, not the ingress layer. The notable exception is Cilium with eBPF-based forwarding, which eliminates userspace proxy overhead entirely and can show measurable latency reduction at high percentiles for latency-sensitive workloads. If you are running at a scale where ingress controller throughput is actually the constraint, you already have the engineering resources to benchmark your specific workload profile against candidate controllers before committing.

Should we just move everything to a cloud provider’s managed load balancer and skip the in-cluster controller?

This is a legitimate option for teams on managed Kubernetes (EKS, GKE, AKS). Cloud-native load balancers (AWS ALB via AWS Load Balancer Controller, GKE Gateway, Azure Application Gateway Ingress Controller) eliminate the operational burden of managing an in-cluster controller and integrate deeply with cloud IAM, WAF, and observability services. The trade-offs are cost (cloud LBs charge per rule and per hour), vendor lock-in, and reduced portability. For purely cloud-native workloads with no multi-cloud or on-premises requirements, cloud-managed load balancers are worth serious consideration and sidestep the ingress-nginx problem entirely. For hybrid or multi-cluster environments, in-cluster controllers maintain an advantage in consistency and portability.

Prometheus Scalability: High Cardinality and How to Fix It

Prometheus Scalability: High Cardinality and How to Fix It

Prometheus has become the de facto standard for metrics collection in cloud-native environments. Its pull-based model, powerful query language, and deep Kubernetes integration make it an obvious choice for platform teams. But as organizations scale — more services, more replicas, more labels — Prometheus starts showing cracks. Queries slow down, memory usage balloons, and what was once a reliable monitoring backbone becomes an operational liability. This article examines exactly why that happens and what you can do about it, from quick tactical fixes to full architectural overhauls.

The Cardinality Problem: Why It Kills Prometheus

Cardinality is the single most important concept to understand when troubleshooting Prometheus scalability. In the context of time series databases, cardinality refers to the total number of unique label combinations that exist across all your metrics. Every unique combination creates a distinct time series, and Prometheus must store, index, and query each of them independently.

Consider a simple HTTP request counter: http_requests_total. If you label it with method (GET, POST, PUT, DELETE), status_code (200, 201, 400, 404, 500, 503), and endpoint (50 distinct API paths), you already have 4 × 6 × 50 = 1,200 time series from a single metric. Now add a customer_id label with 10,000 distinct values. You have just created 12 million time series from one counter.

This is the cardinality explosion pattern, and it is the most common cause of Prometheus degradation in production. The problem is compounded by labels that have unbounded or high-entropy values:

  • User IDs or session tokens embedded in labels
  • Request IDs or trace IDs (effectively infinite cardinality)
  • Pod names without proper aggregation, especially in autoscaling environments
  • Free-form error messages or SQL query strings
  • IP addresses, particularly in environments with high churn

The relationship between cardinality and resource consumption is not linear — it is roughly proportional but carries significant overhead per series in memory indexing structures. Prometheus stores its head block (the most recent data) entirely in memory. Each time series in the head block requires approximately 3–4 KB of RAM for the series itself plus index entries. A Prometheus instance with 1 million active time series will typically consume 4–6 GB of RAM just for the head block, before accounting for query processing overhead.

Memory Explosion Patterns and Real Symptoms

Memory issues in Prometheus rarely announce themselves cleanly. Instead, they manifest through a cascade of symptoms that are easy to misdiagnose. Understanding the failure modes helps you identify the root cause faster and apply the right remedy.

The Head Block Growth Pattern

Prometheus keeps a two-hour window of data in memory as the head block before compacting it to disk. If your series count grows continuously — which happens when pod churn creates new series faster than old ones expire — the head block never shrinks. You can monitor this directly with prometheus_tsdb_head_series and prometheus_tsdb_head_chunks. A healthy instance shows this number plateauing. A cardinality problem shows it growing monotonically until OOM.

Query Timeout Cascades

As series count grows, even well-written PromQL queries that worked fine at 100k series become unbearably slow at 1M. Grafana dashboards start timing out, alert evaluation lags behind schedule, and Alertmanager begins receiving delayed or duplicated firing alerts. The prometheus_rule_evaluation_duration_seconds metric is a reliable early warning — when p99 evaluation time for your recording rules exceeds your evaluation interval, you have a problem.

Scrape Failures Under Memory Pressure

When Prometheus is under heavy memory pressure, its Go garbage collector starts spending more time collecting, which introduces latency into the scrape loop. Scrapes begin timing out, causing gaps in your data. This creates a deceptive situation where you have gaps in metrics precisely when your system is under stress — exactly when you need monitoring most. Watch up metric drops and prometheus_target_scrapes_exceeded_sample_limit_total for these patterns.

Compaction Pressure

High cardinality also stresses the TSDB compaction process. Prometheus compacts head block data into persistent blocks every two hours. With millions of series, compaction can take tens of seconds to minutes, during which write performance degrades. prometheus_tsdb_compaction_duration_seconds rising above 30 seconds is a warning sign. Compaction failures leave orphaned blocks on disk, gradually consuming storage and potentially corrupting the TSDB if left unaddressed.

Short-Term Fixes: Tactical Remediation

When you are dealing with a Prometheus instance under active stress, you need immediate relief before you can implement architectural changes. These techniques can be applied quickly and provide meaningful headroom while longer-term solutions are planned.

Recording Rules: Pre-Computing Aggregations

Recording rules are the most underutilized tool in the Prometheus toolbox. They allow you to pre-compute expensive PromQL expressions and store the results as new time series. The key benefit for scalability is that you can aggregate away high-cardinality dimensions, dramatically reducing the number of series that dashboards and alerts need to query at runtime.

Consider an example where you have per-pod HTTP request rates with labels for pod, namespace, service, method, and status_code. Your dashboards mostly need service-level aggregations, not per-pod breakdowns. A recording rule can produce that aggregation once per evaluation interval:

groups:
  - name: http_aggregations
    interval: 30s
    rules:
      - record: job:http_requests_total:rate5m
        expr: |
          sum by (job, namespace, method, status_code) (
            rate(http_requests_total[5m])
          )

      - record: job:http_request_duration_seconds:p99_5m
        expr: |
          histogram_quantile(0.99,
            sum by (job, namespace, le) (
              rate(http_request_duration_seconds_bucket[5m])
            )
          )

      - record: namespace:http_requests_total:rate5m
        expr: |
          sum by (namespace, status_code) (
            rate(http_requests_total[5m])
          )

Notice that the pod label is dropped in all three rules. If you had 500 pods, you have just reduced the cardinality of these series by a factor of 500. Dashboards querying job:http_requests_total:rate5m instead of computing rate(http_requests_total[5m]) on the fly will return results orders of magnitude faster.

The naming convention level:metric:operations is the Prometheus community standard. Following it consistently makes recording rules self-documenting and helps teams understand the aggregation level at a glance.

Metric Dropping via Relabeling

Relabeling gives you surgical control over what metrics Prometheus actually ingests. There are two stages where relabeling applies: relabel_configs (applied before scraping, based on target metadata) and metric_relabel_configs (applied after scraping, based on scraped metric names and labels). For cardinality control, metric_relabel_configs is your primary tool.

Dropping entire metric families that you do not use is the most impactful change you can make. Many exporters emit dozens of metrics that are irrelevant for most use cases:

scrape_configs:
  - job_name: kubernetes-pods
    metric_relabel_configs:
      # Drop metrics we never query
      - source_labels: [__name__]
        regex: 'go_gc_.*|go_memstats_.*|process_.*'
        action: drop

      # Drop high-cardinality label values while keeping the metric
      - source_labels: [__name__, pod]
        regex: 'http_requests_total;.*'
        target_label: pod
        replacement: ''

      # Drop entire time series based on label combinations
      - source_labels: [__name__, le]
        regex: 'http_request_duration_seconds_bucket;(\+Inf|100|250|500)'
        action: keep

      # Replace high-cardinality endpoint paths with normalized versions
      - source_labels: [endpoint]
        regex: '/api/v1/users/[0-9]+'
        target_label: endpoint
        replacement: '/api/v1/users/:id'

Be careful with metric_relabel_configs — they are applied per scraped sample, so computationally expensive regex patterns across high-frequency scrapes can add CPU overhead. Test regex patterns and prefer anchored, non-backtracking expressions.

Cardinality Limits as a Safety Net

Prometheus 2.x introduced per-scrape sample limits as a defensive mechanism. These do not solve cardinality problems but prevent a single misbehaving exporter from taking down your entire Prometheus instance:

global:
  # Global limit across all scrapes
  sample_limit: 0  # 0 = no limit

scrape_configs:
  - job_name: application-pods
    # Reject scrapes that return more than 50k samples
    sample_limit: 50000

    # Limit unique label sets per scrape
    label_limit: 64

    # Limit label name and value lengths
    label_name_length_limit: 256
    label_value_length_limit: 1024

    kubernetes_sd_configs:
      - role: pod

When a scrape exceeds sample_limit, Prometheus rejects the entire scrape and marks the target as having failed. This is a hard circuit breaker, not a graceful degradation — the target’s up metric goes to 0. Set limits conservatively above your expected maximum to avoid false positives, and alert on prometheus_target_scrapes_exceeded_sample_limit_total > 0.

Architectural Solutions: Federation and Remote Write

Once you have exhausted tactical optimizations or when your scale genuinely exceeds what a single Prometheus instance can handle, architectural changes become necessary. Prometheus offers two built-in mechanisms for scaling horizontally: federation and remote_write.

Federation: Hierarchical Scraping

Prometheus federation allows one Prometheus instance to scrape aggregated metrics from other Prometheus instances via the /federate endpoint. In a typical setup, leaf-level Prometheus instances collect raw metrics from targets, while a global Prometheus instance federates pre-aggregated recording rule results from the leaves.

# Global Prometheus configuration federating from regional instances
scrape_configs:
  - job_name: federate-eu-west
    scrape_interval: 15s
    honor_labels: true
    metrics_path: /federate
    params:
      match[]:
        # Only federate pre-aggregated recording rule metrics
        - '{__name__=~"job:.*"}'
        - '{__name__=~"namespace:.*"}'
        - '{__name__=~"cluster:.*"}'
        # Federate key infrastructure alerts
        - 'up{job="kubernetes-apiservers"}'
    static_configs:
      - targets:
          - prometheus-eu-west.monitoring.svc:9090
          - prometheus-us-east.monitoring.svc:9090
          - prometheus-ap-south.monitoring.svc:9090

Federation works well for multi-region global dashboards and cross-cluster alerting on aggregated signals. Its limitations are significant, though: the /federate endpoint is a point-in-time snapshot, so you cannot run range queries against federated data effectively. It also creates a single point of failure at the global layer and does not provide true long-term storage. For those requirements, remote_write is the better path.

Remote Write: Streaming to Durable Storage

Remote write allows Prometheus to stream all ingested samples to an external storage backend in real time. The external backend handles long-term retention, multi-tenancy, and global query federation. Prometheus itself becomes a stateless collection agent that maintains only a short local retention window for resilience against network outages.

remote_write:
  - url: https://thanos-receive.monitoring.svc:19291/api/v1/receive
    # Authentication for the remote endpoint
    basic_auth:
      username: prometheus
      password_file: /etc/prometheus/secrets/remote-write-password

    # Tune the write queue for throughput vs. latency
    queue_config:
      # Number of shards (parallel write connections)
      max_shards: 200
      min_shards: 1
      # Samples to batch before flushing
      max_samples_per_send: 500
      # Time to wait before flushing an incomplete batch
      batch_send_deadline: 5s
      # In-memory buffer capacity per shard
      capacity: 2500
      # How long to retry failed writes
      min_backoff: 30ms
      max_backoff: 5s

    # Metadata configuration
    metadata_config:
      send: true
      send_interval: 1m

    # Filter what gets remote-written (reduce egress)
    write_relabel_configs:
      - source_labels: [__name__]
        regex: 'go_gc_.*|go_memstats_.*'
        action: drop

The queue_config tuning is critical and frequently misunderstood. Each shard maintains its own connection to the remote endpoint and its own in-memory queue. Increasing max_shards increases parallelism and throughput but also increases memory consumption and load on the remote endpoint. The right values depend heavily on your sample ingestion rate and network latency to the remote endpoint. Monitor prometheus_remote_storage_queue_highest_sent_timestamp_seconds versus prometheus_remote_storage_highest_timestamp_in_seconds — the lag between them tells you how far behind your remote write queue is.

Long-Term Solutions: Thanos vs Grafana Mimir vs VictoriaMetrics

For production systems that need long-term storage, global query capability, high availability, and genuine horizontal scalability, purpose-built solutions are the right answer. Three projects dominate this space: Thanos, Grafana Mimir, and VictoriaMetrics. They share similar goals but differ significantly in architecture, operational complexity, and trade-offs.

CriterionThanosGrafana MimirVictoriaMetrics
ArchitectureSidecar + object store; modular componentsFully distributed; Cortex-derived microservicesSingle binary or cluster mode
Storage backendAny S3-compatible object storeAny S3-compatible object storeOwn TSDB format on local or object store
PromQL compatibilityFull PromQL; own query engineFull PromQL; Mimir-specific extensionsMetricsQL (PromQL superset)
Operational complexityMedium — multiple components, each simpleHigh — many microservices with complex configLow — minimal components, simple config
Ingest scalabilityScales via Thanos Receive fan-outHorizontally scalable distributors + ingestersExcellent; handles millions of samples/sec per node
Query performanceGood; Store Gateway caches object store dataGood; query sharding and caching built inExcellent; highly optimized query engine
Multi-tenancyLimited; tenant isolation via external labelsNative; per-tenant limits and isolationEnterprise only; basic in cluster mode
DeduplicationBuilt-in; replica dedup at query timeBuilt-in; ingest-time and query-time dedupBuilt-in; dedup with downsampling
DownsamplingYes; Thanos Compactor handles itYes; configurable per tenantYes; automatic with vmbackupmanager
LicenseApache 2.0 (fully open source)AGPL-3.0 (open source) + enterprise tierApache 2.0 (community); proprietary enterprise
Best fitTeams already running Prometheus wanting minimal disruptionLarge orgs needing multi-tenant SaaS-grade monitoringTeams prioritizing simplicity and raw performance

Thanos: The Incremental Path

Thanos integrates with existing Prometheus deployments through a sidecar process that runs alongside each Prometheus pod. The sidecar uploads completed TSDB blocks to object storage (S3, GCS, Azure Blob) and exposes a gRPC Store API that Thanos Query uses to federate queries across all Prometheus instances plus historical data in the object store. This makes Thanos the lowest-friction path for teams with existing Prometheus infrastructure.

Thanos Receive is an alternative ingest path that accepts remote_write directly, which is useful when you want to decouple Prometheus instances from the query layer or implement active-active HA without relying on Prometheus replication. Thanos Compactor handles block compaction and downsampling on the object store, creating 5-minute and 1-hour resolution downsamples automatically for efficient long-range queries.

Grafana Mimir: Enterprise-Grade Multi-Tenancy

Mimir is a fork of Cortex, rewritten by Grafana Labs to address operational complexity issues in Cortex’s architecture. It follows the same microservices pattern — Distributor, Ingester, Querier, Query Frontend, Store Gateway, Compactor, Ruler — but with significantly improved defaults and a monolithic deployment mode that simplifies small-scale deployments. Mimir’s headline feature is native multi-tenancy with per-tenant cardinality limits, query limits, and ingestion rate limits enforced at the distributor layer.

Mimir is the right choice when you need to run monitoring as an internal platform service for multiple teams or business units, each with independent resource quotas and data isolation. The operational overhead is substantial, but for large organizations it is justified by the isolation and governance capabilities.

VictoriaMetrics: Simplicity and Raw Performance

VictoriaMetrics takes a fundamentally different approach: rather than building on top of Prometheus’s TSDB format, it implements its own highly optimized storage engine. The result is dramatically better compression (often 5–10x better than Prometheus TSDB) and query performance that consistently outperforms Thanos and Mimir in benchmarks, particularly for high-cardinality workloads and large time ranges. The single-node binary handles workloads that would require a full Thanos cluster, and the cluster version adds horizontal scalability with fewer moving parts than Thanos or Mimir.

VictoriaMetrics also supports MetricsQL, a superset of PromQL that adds useful functions like outlierIQR(), limitOffset(), and improved histogram handling. Grafana datasource compatibility is maintained through a PromQL-compatible API, so existing dashboards work without modification.

Practical Guide: Choosing Your Scaling Approach

The right solution depends on your current scale, team capacity, and trajectory. This is not a one-size-fits-all decision. Here is a pragmatic framework for matching the solution to the problem.

Stage 1: Under 1 Million Active Series

A single Prometheus instance with proper tuning should handle this comfortably. Focus on recording rules to eliminate expensive dashboard queries, implement metric_relabel_configs to drop unused metrics, and set sample_limit guards. Increase Prometheus memory limits to give it adequate headroom (at minimum 8 GB, ideally 16 GB for instances approaching 1M series). Set --storage.tsdb.retention.time to the minimum that satisfies your compliance and debugging needs — 15 days is often enough if you have remote_write configured to a longer-term store.

Stage 2: 1–5 Million Active Series

At this scale, a single instance is viable but requires vertical scaling and aggressive optimization. Consider sharding your Prometheus deployment by functional area: one instance for infrastructure metrics, one for application metrics, one for business metrics. This is horizontal scaling via functional decomposition, not true distributed architecture. Add remote_write to object storage for long-term retention. If you are running Kubernetes, the Prometheus Operator with multiple Prometheus custom resources per namespace group is a clean implementation of this pattern.

This is also the stage where VictoriaMetrics single-node becomes compelling — it can handle this range comfortably with far less RAM than Prometheus and simpler operations than a full distributed system.

Stage 3: 5 Million+ Active Series or Global Requirements

At this scale, a distributed architecture is necessary. Your choice among Thanos, Mimir, and VictoriaMetrics Cluster depends primarily on:

  • Existing Prometheus investment + incremental migration: Thanos Sidecar is the path of least resistance. Your existing Prometheus instances keep working; you add sidecars and deploy Thanos query components.
  • Multi-tenant platform with governance requirements: Grafana Mimir, accepting the operational complexity in exchange for native tenant isolation and limits.
  • Maximum performance with minimal operational burden: VictoriaMetrics Cluster, replacing Prometheus entirely or alongside it via remote_write, with dramatically simpler operations than Thanos or Mimir.
  • Multi-region, cross-cloud global monitoring: Thanos or Mimir, both have mature multi-region architectures; VictoriaMetrics Enterprise has similar capabilities but is not open source.

Complementary Configuration: Thanos Sidecar Example

For teams adopting Thanos, the sidecar configuration alongside a Prometheus deployment looks like this in a Kubernetes environment:

# Thanos sidecar configuration (as part of Prometheus pod spec)
containers:
  - name: prometheus
    image: prom/prometheus:v2.48.0
    args:
      - --config.file=/etc/prometheus/prometheus.yml
      - --storage.tsdb.path=/prometheus
      # Keep 2h locally; Thanos handles long-term
      - --storage.tsdb.retention.time=2h
      # Thanos requires min-block-duration = max-block-duration for sidecar
      - --storage.tsdb.min-block-duration=2h
      - --storage.tsdb.max-block-duration=2h
      - --web.enable-lifecycle

  - name: thanos-sidecar
    image: quay.io/thanos/thanos:v0.32.0
    args:
      - sidecar
      - --tsdb.path=/prometheus
      - --prometheus.url=http://localhost:9090
      - --grpc-address=0.0.0.0:10901
      - --http-address=0.0.0.0:10902
      # Object store configuration
      - --objstore.config-file=/etc/thanos/objstore.yml
    volumeMounts:
      - name: prometheus-data
        mountPath: /prometheus
      - name: thanos-objstore-config
        mountPath: /etc/thanos

---
# Object store configuration (s3-compatible)
# /etc/thanos/objstore.yml
type: S3
config:
  bucket: my-thanos-metrics
  endpoint: s3.eu-west-1.amazonaws.com
  region: eu-west-1
  # Use IAM role or provide credentials via environment
  access_key: ""
  secret_key: ""

VictoriaMetrics as Remote Write Target

If you choose VictoriaMetrics as your remote storage backend, the integration with existing Prometheus instances is straightforward. VictoriaMetrics exposes a remote_write compatible endpoint at /api/v1/write:

# prometheus.yml — remote write to VictoriaMetrics
remote_write:
  - url: http://victoriametrics:8428/api/v1/write
    queue_config:
      max_samples_per_send: 10000
      capacity: 20000
      max_shards: 30

# VictoriaMetrics single-node startup (Docker Compose example)
services:
  victoriametrics:
    image: victoriametrics/victoria-metrics:v1.95.1
    command:
      - -storageDataPath=/victoria-metrics-data
      # Retain 1 year of data
      - -retentionPeriod=12
      # Enable deduplication (for HA Prometheus pairs)
      - -dedup.minScrapeInterval=15s
      # Memory limit
      - -memory.allowedPercent=60
    ports:
      - "8428:8428"
    volumes:
      - vm-data:/victoria-metrics-data

volumes:
  vm-data:

VictoriaMetrics also exposes a Prometheus-compatible query API at /api/v1/query and /api/v1/query_range, so Grafana datasources pointing at it need only a URL change — no plugin installation required for basic use. For MetricsQL-specific functions, use the VictoriaMetrics datasource plugin available in Grafana’s plugin catalog.

Observing Your Prometheus Health

Before implementing any of these solutions, establish a baseline understanding of your Prometheus instance’s current health. The following PromQL expressions give you immediate visibility into the key indicators:

# Total active time series in the head block
prometheus_tsdb_head_series

# Series created vs. removed (churn indicator)
rate(prometheus_tsdb_head_series_created_total[5m])
rate(prometheus_tsdb_head_series_removed_total[5m])

# Memory usage of the head block chunks
prometheus_tsdb_head_chunks_storage_size_bytes

# Remote write lag (seconds behind)
(
  prometheus_remote_storage_highest_timestamp_in_seconds
  - prometheus_remote_storage_queue_highest_sent_timestamp_seconds
)

# Top cardinality contributors (requires Prometheus 2.14+)
# Run this in Prometheus /api/v1/query:
topk(20,
  count by (__name__) ({__name__!=""})
)

# Alert evaluation lag
rate(prometheus_rule_evaluation_duration_seconds_sum[5m])
/ rate(prometheus_rule_evaluation_duration_seconds_count[5m])

Prometheus also exposes a /api/v1/status/tsdb endpoint that returns cardinality statistics including the top 10 metrics by series count and the top 10 label names by cardinality. This is invaluable for identifying which specific metrics or labels are causing problems and should be your first stop when investigating a new cardinality issue.

Frequently Asked Questions

How do I identify which metrics are causing my cardinality explosion?

Start with the /api/v1/status/tsdb endpoint on your Prometheus instance. It returns a JSON response with seriesCountByMetricName, seriesCountByLabelValuePair, and seriesCountByLabelName arrays, each showing the top contributors to your total series count. This points you directly at the offending metrics and labels without any external tooling. Complement this with topk(20, count by (__name__)({__name__!=""})) in PromQL, which gives you the same information in a queryable format you can alert on. Once you know the metric name, query count by (label1, label2) (your_metric_name) replacing label pairs to identify which specific label dimensions are driving the high count.

Can I run Prometheus in HA without external dependencies?

Yes, but with important caveats. The standard HA pattern for standalone Prometheus is to run two identical Prometheus instances scraping the same targets. Both instances collect data independently, and Alertmanager deduplicates alerts from both using its mesh clustering (run multiple Alertmanager instances in a cluster and point both Prometheus instances at all of them). This provides alerting HA — alerts fire even if one Prometheus instance is down. It does not provide query HA in the traditional sense, because each instance has its own independent data and queries against a failed instance simply fail. Dashboards pointing at a specific instance will show gaps during that instance’s downtime. For true query HA with failover and deduplication, you need Thanos Query (which can deduplicate replica series at query time using the replica external label) or a similar solution. Running Prometheus without any external dependencies means accepting these query HA limitations.

What is a safe maximum cardinality for a single Prometheus instance?

There is no universal number — it depends heavily on your scrape interval, available RAM, and query patterns. A practical guideline: allocate 3–4 GB of RAM per million active series for the head block alone, then add 50% headroom for query processing. A Prometheus instance with 16 GB of RAM can comfortably handle 2–3 million active series under typical workloads. Beyond 5 million series, even well-resourced single instances start showing query performance degradation that impacts alert evaluation reliability. The more meaningful limit to enforce operationally is series churn rate: an instance creating more than 100,000 new series per minute will struggle regardless of total series count, because the head block indexing operations become a bottleneck. Monitor rate(prometheus_tsdb_head_series_created_total[5m]) and treat sustained values above 50,000/minute as a warning condition.

Should I use Thanos Sidecar or Thanos Receive for ingestion?

The choice comes down to whether you want to keep Prometheus as the authoritative ingest layer or move toward a push-based architecture. Thanos Sidecar is the simpler, lower-risk option: Prometheus continues operating normally, the sidecar uploads completed blocks to object storage in the background, and you gain long-term storage and global query capability with minimal disruption. The drawback is that Prometheus must have local storage for at least 2 hours (one block duration), and the sidecar requires that min-block-duration equals max-block-duration, which prevents Prometheus from doing its own compaction. Thanos Receive accepts remote_write from any Prometheus instance, which enables active-active HA setups where multiple Prometheus replicas write to a Receive hashring simultaneously, and Receive handles deduplication. This is more complex to operate but provides better ingest-side redundancy. For most teams starting with Thanos, Sidecar is the right first step. Receive makes sense when you are building a centralized monitoring platform that accepts writes from many Prometheus instances across different clusters or environments.

Is it worth migrating from Thanos to VictoriaMetrics or Mimir once you are already running Thanos?

Migration from Thanos to an alternative should be driven by specific pain points, not by benchmark numbers alone. If your team is spending significant time operating Thanos (debugging query Store Gateway cache issues, managing compactor conflicts, handling block upload failures), and your primary need is simplicity and query performance rather than multi-tenancy, VictoriaMetrics is worth evaluating seriously. The migration path is smooth: run VictoriaMetrics alongside Thanos temporarily, migrate remote_write targets to VictoriaMetrics, and decommission Thanos once you are satisfied. Historical data in your object store can be imported using VictoriaMetrics’s vmctl tool. If your pain point is multi-tenancy and governance — multiple teams with independent data isolation, per-tenant rate limits, chargeback requirements — Mimir is the right destination and the operational complexity is justified. The one scenario where staying with Thanos is usually the right call is when your organization has invested heavily in Thanos tooling, has stable operations, and does not have specific unmet needs. Migration carries real costs in engineering time and operational risk; make sure the benefits are concrete and quantified before committing.

Gateway API Provider Support in 2026: A Critical Evaluation

Gateway API Provider Support in 2026: A Critical Evaluation

The Kubernetes Gateway API is no longer a future concept—it’s the present standard for traffic management. With the deprecation of Ingress NGINX’s stable APIs signaling a definitive shift, platform teams and architects are now faced with a critical decision: which Gateway API provider to adopt. The official implementations page lists numerous options, but the real-world picture is one of fragmented support, varying stability, and significant gaps that can derail multi-cluster strategies.

In this evaluation, we move beyond marketing checklists to analyze the practical state of Gateway API support across major cloud providers, ingress controllers, and service meshes. We’ll examine which versions are truly production-ready, where the interoperability pitfalls lie, and what you must account for before standardizing across your infrastructure.

The Gateway API Maturity Spectrum: From Experimental to Standard

Not all Gateway API resources are created equal. The API’s unique versioning model—with features progressing through Experimental, Standard, and Extended support tracks—means provider support is inherently uneven. An implementation might fully support the stable Gateway and HTTPRoute resources while offering only partial or experimental backing for GRPCRoute or TCPRoute.

This creates a fundamental challenge for architects: designing for the lowest common denominator or accepting provider-specific constraints. The decision hinges on accurately mapping your traffic management requirements (HTTP, TLS termination, gRPC, TCP/UDP load balancing) against what each provider actually delivers in a stable form.

Core API Support: The Foundation

Most providers now support the v1 (GA) versions of the foundational resources:

  • GatewayClass & Gateway: Nearly universal support for v1. These are the control plane resources for provisioning and configuring load balancers.
  • HTTPRoute: Universal support for v1. This is the workhorse for HTTP/HTTPS traffic routing and is considered the most stable.

However, support for other route types reveals the fragmentation:

  • GRPCRoute: Often in beta or experimental stages. Critical for modern microservices architectures but not yet universally reliable.
  • TCPRoute & UDPRoute: Patchy support. Some providers implement them as beta, others ignore them entirely, forcing fallbacks to provider-specific annotations or custom resources.
  • TLSRoute: Frequently tied to specific certificate management integrations (e.g., cert-manager).

Major Provider Deep Dive: Implementation Realities

AWS Elastic Kubernetes Service (EKS)

AWS offers an official Gateway API controller for EKS. Its support is pragmatic but currently limited:

  • Supported Resources: GatewayClass, Gateway, HTTPRoute, and GRPCRoute (all v1beta1 as of early 2024). Note the use of v1beta1 for GRPCRoute, indicating it’s not yet at GA stability.
  • Underlying Infrastructure: Maps directly to AWS Application Load Balancer (ALB) and Network Load Balancer (NLB). This is a strength (managed AWS services) and a constraint (you inherit ALB/NLB feature limits).
  • Critical Gap: No support for TCPRoute or UDPRoute. If your workload requires raw TCP/UDP load balancing, you must use the legacy Kubernetes Service type LoadBalancer or a different ingress controller alongside the Gateway API controller, creating a disjointed management model.

Google Kubernetes Engine (GKE) & Azure Kubernetes Service (AKS)

Both Google and Azure have integrated Gateway API support directly into their managed Kubernetes offerings, often with a focus on their global load-balancing infrastructures.

  • GKE: Offers the GKE Gateway controller. It supports v1 resources and can provision Google Cloud Global External Load Balancers. Its integration with Google’s certificate management and CDN is a key advantage. However, advanced routing features may require GCP-specific backend configs.
  • AKS: Provides the Application Gateway Ingress Controller (AGIC) with Gateway API support, mapping to Azure Application Gateway. Support for newer route types like GRPCRoute has historically lagged behind other providers.

The pattern here is clear: cloud providers implement the Gateway API as a facade over their existing, proprietary load-balancing products. This ensures stability and performance but can limit portability and advanced cross-provider features.

NGINX & Kong Ingress Controller

These third-party, cluster-based controllers offer a different value proposition: consistency across any Kubernetes distribution, including on-premises.

  • NGINX: With its stable Ingress APIs deprecated in favor of Gateway API, its Gateway API implementation is now the primary path forward. It generally has excellent support for the full range of experimental and standard resources, as it’s not constrained by a cloud vendor’s underlying service. This makes it a strong choice for hybrid or multi-cloud deployments where feature parity is crucial.
  • Kong Ingress Controller: Kong has been an early and comprehensive supporter of the Gateway API, often implementing features quickly. It leverages Kong Gateway’s extensive plugin ecosystem, which can be a major draw but also introduces vendor lock-in.

Critical Gaps for Enterprise Architects

Beyond checking resource support boxes, several deeper gaps can impact production deployments, especially in complex environments.

1. Multi-Cluster & Hybrid Environment Support

The Gateway API specification includes concepts like ReferenceGrant for cross-namespace and future cross-cluster routing. In practice, very few providers have robust, production-ready multi-cluster stories. Most implementations assume a single cluster. If your architecture spans multiple clusters (for isolation, geography, or failure domains), you will likely need to:

  • Manage separate Gateway resources per cluster.
  • Use an external global load balancer (like a cloud DNS/GSLB) to distribute traffic across cluster-specific gateways.
  • This negates some of the API’s promise of a unified, abstracted configuration.

2. Policy Attachment and Extension Consistency

Gateway API is designed to be extended through policy attachment (e.g., for rate limiting, WAF rules, authentication). There is no standard for how these policies are implemented. One provider might use a custom RateLimitPolicy CRD, while another might rely on annotations or a separate policy engine. This creates massive configuration drift and vendor lock-in, breaking the portability goal.

3. Observability and Debugging Interfaces

While the API defines status fields, the richness of operational data—detailed error logs, granular metrics tied to API resources, distributed tracing integration—varies wildly. Some providers expose deep integration with their monitoring stack; others offer minimal visibility. You must verify that the provider’s observability model meets your SRE team’s needs.

Evaluation Framework: Questions for Your Team

Before selecting a provider, work through this technical checklist:

  1. Route Requirements: Do we need stable support for HTTP only, or also gRPC, TCP, UDP? Is beta support acceptable for non-HTTP routes?
  2. Infrastructure Model: Do we want a cloud-managed load balancer (simpler, less control) or a cluster-based controller (more portable, more operational overhead)?
  3. Multi-Cluster Future: Is our architecture single-cluster today but likely to expand? Does the provider have a credible roadmap for multi-cluster Gateway API?
  4. Policy Needs: What advanced policies (auth, WAF, rate limiting) are required? How does the provider implement them? Can we live with vendor-specific policy CRDs?
  5. Observe & Debug: What logging, metrics, and tracing are exposed for Gateway API resources? Do they integrate with our existing observability platform?
  6. Upgrade Path: What is the provider’s track record for supporting new Gateway API releases? How painful are version upgrades?

Strategic Recommendations

Based on the current landscape, here are pragmatic paths forward:

  • For Single-Cloud Deployments: Start with your cloud provider’s native controller (AWS, GKE, AKS). It’s the path of least resistance and best integration with other cloud services (IAM, certificates, monitoring). Just be acutely aware of its specific limitations regarding unsupported route types.
  • For Hybrid/Multi-Cloud or On-Premises: Standardize on a portable, cluster-based controller like Ingress-NGINX or Kong. The consistency across environments will save significant operational complexity, even if it means forgoing some cloud-native integrations.
  • For Greenfield Projects: Design your applications and configurations against the stable v1 resources (Gateway, HTTPRoute) only. Treat any use of beta/experimental resources as a known risk that may require refactoring later.
  • Always Have an Exit Plan: Isolate Gateway API configuration YAMLs from provider-specific policies and annotations. This modularity will make migration less painful when the next generation of providers emerges or when you need to switch.

The Gateway API’s evolution is a net positive for the Kubernetes ecosystem, offering a far more expressive model than the original Ingress. However, in 2026, the provider landscape is still maturing. Support is broad but not deep, and critical gaps in multi-cluster management and policy portability remain. The successful architect will choose a provider not based on a feature checklist, but based on how well its specific constraints and capabilities align with their organization’s immediate traffic patterns and long-term platform strategy. The era of a universal, write-once-run-anywhere Gateway API configuration is not yet here—but with careful, informed provider selection, you can build a robust foundation for it.

Kubernetes Housekeeping: How to Clean Up Orphaned ConfigMaps and Secrets

Kubernetes Housekeeping: How to Clean Up Orphaned ConfigMaps and Secrets

If you’ve been running Kubernetes clusters for any meaningful amount of time, you’ve likely encountered a familiar problem: orphaned ConfigMaps and Secrets piling up in your namespaces. These abandoned resources don’t just clutter your cluster—they introduce security risks, complicate troubleshooting, and can even impact cluster performance as your resource count grows.

The reality is that Kubernetes doesn’t automatically clean up ConfigMaps and Secrets when the workloads that reference them are deleted. This gap in Kubernetes’ native garbage collection creates a housekeeping problem that every production cluster eventually faces. In this article, we’ll explore why orphaned resources happen, how to detect them, and most importantly, how to implement sustainable cleanup strategies that prevent them from accumulating in the first place.

Understanding the Orphaned Resource Problem

What Are Orphaned ConfigMaps and Secrets?

Orphaned ConfigMaps and Secrets are configuration resources that no longer have any active references from Pods, Deployments, StatefulSets, or other workload resources in your cluster. They typically become orphaned when:

  • Applications are updated and new ConfigMaps are created while old ones remain
  • Deployments are deleted but their associated configuration resources aren’t
  • Failed rollouts leave behind unused configuration versions
  • Development and testing workflows create temporary resources that never get cleaned up
  • CI/CD pipelines generate unique ConfigMap names (often with hash suffixes) on each deployment

Why This Matters for Production Clusters

While a few orphaned ConfigMaps might seem harmless, the problem compounds over time and introduces real operational challenges:

Security Risks: Orphaned Secrets can contain outdated credentials, API keys, or certificates that should no longer be accessible. If these aren’t removed, they remain attack vectors for unauthorized access—especially problematic if RBAC policies grant broad read access to Secrets within a namespace.

Cluster Bloat: Kubernetes stores these resources in etcd, your cluster’s backing store. As the number of orphaned resources grows, etcd size increases, potentially impacting cluster performance and backup times. In extreme cases, this can contribute to etcd performance degradation or even hit storage quotas.

Operational Complexity: When troubleshooting issues or reviewing configurations, sifting through dozens of unused ConfigMaps makes it harder to identify which resources are actually in use. This “configuration noise” slows down incident response and increases cognitive load for your team.

Cost Implications: While individual ConfigMaps are small, at scale they contribute to storage costs and can trigger alerts in cost monitoring systems, especially in multi-tenant environments where resource quotas matter.

Detecting Orphaned ConfigMaps and Secrets

Before you can clean up orphaned resources, you need to identify them. Let’s explore both manual detection methods and automated tooling approaches.

Manual Detection with kubectl

The simplest approach uses kubectl to cross-reference ConfigMaps and Secrets against active workload resources. Here’s a basic script to identify potentially orphaned ConfigMaps:

#!/bin/bash
# detect-orphaned-configmaps.sh
# Identifies ConfigMaps not referenced by any active Pods

NAMESPACE=${1:-default}

echo "Checking for orphaned ConfigMaps in namespace: $NAMESPACE"
echo "---"

# Get all ConfigMaps in the namespace
CONFIGMAPS=$(kubectl get configmaps -n $NAMESPACE -o jsonpath='{.items[*].metadata.name}')

for cm in $CONFIGMAPS; do
    # Skip kube-root-ca.crt as it's system-managed
    if [[ "$cm" == "kube-root-ca.crt" ]]; then
        continue
    fi

    # Check if any Pod references this ConfigMap
    REFERENCED=$(kubectl get pods -n $NAMESPACE -o json | \
        jq -r --arg cm "$cm" '.items[] |
        select(
            (.spec.volumes[]?.configMap.name == $cm) or
            (.spec.containers[].env[]?.valueFrom.configMapKeyRef.name == $cm) or
            (.spec.containers[].envFrom[]?.configMapRef.name == $cm)
        ) | .metadata.name' | head -1)

    if [[ -z "$REFERENCED" ]]; then
        echo "Orphaned: $cm"
    fi
done

A similar script for Secrets would look like this:

#!/bin/bash
# detect-orphaned-secrets.sh

NAMESPACE=${1:-default}

echo "Checking for orphaned Secrets in namespace: $NAMESPACE"
echo "---"

SECRETS=$(kubectl get secrets -n $NAMESPACE -o jsonpath='{.items[*].metadata.name}')

for secret in $SECRETS; do
    # Skip service account tokens and system secrets
    SECRET_TYPE=$(kubectl get secret $secret -n $NAMESPACE -o jsonpath='{.type}')
    if [[ "$SECRET_TYPE" == "kubernetes.io/service-account-token" ]]; then
        continue
    fi

    # Check if any Pod references this Secret
    REFERENCED=$(kubectl get pods -n $NAMESPACE -o json | \
        jq -r --arg secret "$secret" '.items[] |
        select(
            (.spec.volumes[]?.secret.secretName == $secret) or
            (.spec.containers[].env[]?.valueFrom.secretKeyRef.name == $secret) or
            (.spec.containers[].envFrom[]?.secretRef.name == $secret) or
            (.spec.imagePullSecrets[]?.name == $secret)
        ) | .metadata.name' | head -1)

    if [[ -z "$REFERENCED" ]]; then
        echo "Orphaned: $secret"
    fi
done

Important caveat: These scripts only check currently running Pods. They won’t catch ConfigMaps or Secrets referenced by Deployments, StatefulSets, or DaemonSets that might currently have zero replicas. For production use, you’ll want to check against all workload resource types.

Automated Detection with Specialized Tools

Several open-source tools have emerged to solve this problem more comprehensively:

Kor: Comprehensive Unused Resource Detection

Kor is a purpose-built tool for finding unused resources across your Kubernetes cluster. It checks not just ConfigMaps and Secrets, but also PVCs, Services, and other resource types.

# Install Kor
brew install kor

# Scan for unused ConfigMaps and Secrets
kor all --namespace production --output json

# Check specific resource types
kor configmap --namespace production
kor secret --namespace production --exclude-namespaces kube-system,kube-public

Kor works by analyzing resource relationships and identifying anything without dependent objects. It’s particularly effective because it understands Kubernetes resource hierarchies and checks against Deployments, StatefulSets, and DaemonSets—not just running Pods.

Popeye: Cluster Sanitization Reports

Popeye scans your cluster and generates reports on resource health, including orphaned resources. While broader in scope than just ConfigMap cleanup, it provides valuable context:

# Install Popeye
brew install derailed/popeye/popeye

# Scan cluster
popeye --output json --save

# Focus on specific namespace
popeye --namespace production

Custom Controllers with Kubernetes APIs

For more sophisticated detection, you can build custom controllers using client-go that continuously monitor for orphaned resources. This approach works well when integrated with your existing observability stack:

// Pseudocode example
func detectOrphanedConfigMaps(namespace string) []string {
    configMaps := listConfigMaps(namespace)
    deployments := listDeployments(namespace)
    statefulSets := listStatefulSets(namespace)
    daemonSets := listDaemonSets(namespace)

    referenced := make(map[string]bool)

    // Check all workload types for ConfigMap references
    for _, deploy := range deployments {
        for _, cm := range getReferencedConfigMaps(deploy) {
            referenced[cm] = true
        }
    }
    // ... repeat for other workload types

    orphaned := []string{}
    for _, cm := range configMaps {
        if !referenced[cm.Name] {
            orphaned = append(orphaned, cm.Name)
        }
    }

    return orphaned
}

Prevention Strategies: Stop Orphans Before They Start

The best cleanup strategy is prevention. By implementing proper resource management patterns from the beginning, you can minimize orphaned resources in the first place.

Use Owner References for Automatic Cleanup

Kubernetes provides a built-in mechanism for resource lifecycle management through owner references. When properly configured, child resources are automatically deleted when their owner is removed.

apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
  namespace: production
  ownerReferences:
    - apiVersion: apps/v1
      kind: Deployment
      name: myapp
      uid: d9607e19-f88f-11e6-a518-42010a800195
      controller: true
      blockOwnerDeletion: true
data:
  app.properties: |
    database.url=postgres://db:5432

Tools like Helm and Kustomize automatically set owner references, which is one reason GitOps workflows tend to have fewer orphaned resources than imperative deployment approaches.

Implement Consistent Labeling Standards

Labels make it much easier to identify resource relationships and track ownership:

apiVersion: v1
kind: ConfigMap
metadata:
  name: api-gateway-config-v2
  labels:
    app: api-gateway
    component: configuration
    version: v2
    managed-by: argocd
    owner: platform-team
data:
  config.yaml: |
    # configuration here

With consistent labeling, you can easily query for ConfigMaps associated with specific applications:

# Find all ConfigMaps for a specific app
kubectl get configmaps -l app=api-gateway

# Clean up old versions
kubectl delete configmaps -l app=api-gateway,version=v1

Adopt GitOps Practices

GitOps tools like ArgoCD and Flux excel at preventing orphaned resources because they maintain a clear desired state:

  • Declarative management: All resources are defined in Git
  • Automatic pruning: Tools can detect and remove resources not defined in Git
  • Audit trail: Git history shows when and why resources were created or deleted

ArgoCD’s sync policies can automatically prune resources:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: myapp
spec:
  syncPolicy:
    automated:
      prune: true  # Remove resources not in Git
      selfHeal: true

Use Kustomize ConfigMap Generators with Hashes

Kustomize’s ConfigMap generator feature appends content hashes to ConfigMap names, ensuring that configuration changes trigger new ConfigMaps:

# kustomization.yaml
configMapGenerator:
  - name: app-config
    files:
      - config.properties
generatorOptions:
  disableNameSuffixHash: false  # Include hash in name

This creates ConfigMaps like app-config-dk9g72hk5f. When you update the configuration, Kustomize creates a new ConfigMap with a different hash. Combined with Kustomize’s --prune flag, old ConfigMaps are automatically removed:

kubectl apply --prune -k ./overlays/production \
  -l app=myapp

Set Resource Quotas

While quotas don’t prevent orphans, they create backpressure that forces teams to clean up:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: config-quota
  namespace: production
spec:
  hard:
    configmaps: "50"
    secrets: "50"

When teams hit quota limits, they’re incentivized to audit and remove unused resources.

Cleanup Strategies for Existing Orphaned Resources

For clusters that already have accumulated orphaned ConfigMaps and Secrets, here are practical cleanup approaches.

One-Time Manual Cleanup

For immediate cleanup, combine detection scripts with kubectl delete:

# Dry run first - review what would be deleted
./detect-orphaned-configmaps.sh production > orphaned-cms.txt
cat orphaned-cms.txt

# Manual review and cleanup
for cm in $(cat orphaned-cms.txt | grep "Orphaned:" | awk '{print $2}'); do
    kubectl delete configmap $cm -n production
done

Critical warning: Always do a dry run and manual review first. Some ConfigMaps might be referenced by workloads that aren’t currently running but will scale up later (HPA scaled to zero, CronJobs, etc.).

Scheduled Cleanup with CronJobs

For ongoing maintenance, deploy a Kubernetes CronJob that runs cleanup scripts periodically:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: configmap-cleanup
  namespace: kube-system
spec:
  schedule: "0 2 * * 0"  # Weekly at 2 AM Sunday
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 3
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: cleanup-sa
          containers:
          - name: cleanup
            image: bitnami/kubectl:latest
            command:
            - /bin/bash
            - -c
            - |
              # Cleanup script here
              echo "Starting ConfigMap cleanup..."

              for ns in $(kubectl get ns -o jsonpath='{.items[*].metadata.name}'); do
                echo "Checking namespace: $ns"

                # Get all workload-referenced ConfigMaps
                REFERENCED_CMS=$(kubectl get deploy,sts,ds -n $ns -o json | \
                  jq -r '.items[].spec.template.spec |
                  [.volumes[]?.configMap.name,
                   .containers[].env[]?.valueFrom.configMapKeyRef.name,
                   .containers[].envFrom[]?.configMapRef.name] |
                  .[] | select(. != null)' | sort -u)

                ALL_CMS=$(kubectl get cm -n $ns -o jsonpath='{.items[*].metadata.name}')

                for cm in $ALL_CMS; do
                  if [[ "$cm" == "kube-root-ca.crt" ]]; then
                    continue
                  fi

                  if ! echo "$REFERENCED_CMS" | grep -q "^$cm$"; then
                    echo "Deleting orphaned ConfigMap: $cm in namespace: $ns"
                    kubectl delete cm $cm -n $ns
                  fi
                done
              done
          restartPolicy: OnFailure
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: cleanup-sa
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cleanup-role
rules:
- apiGroups: [""]
  resources: ["configmaps", "secrets", "namespaces"]
  verbs: ["get", "list", "delete"]
- apiGroups: ["apps"]
  resources: ["deployments", "statefulsets", "daemonsets"]
  verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: cleanup-binding
subjects:
- kind: ServiceAccount
  name: cleanup-sa
  namespace: kube-system
roleRef:
  kind: ClusterRole
  name: cleanup-role
  apiGroup: rbac.authorization.k8s.io

Security consideration: This CronJob needs cluster-wide permissions to read workloads and delete ConfigMaps. Review and adjust the RBAC permissions based on your security requirements. Consider limiting to specific namespaces if you don’t need cluster-wide cleanup.

Integration with CI/CD Pipelines

Build cleanup into your deployment workflows. Here’s an example GitLab CI job:

cleanup_old_configs:
  stage: post-deploy
  image: bitnami/kubectl:latest
  script:
    - |
      # Delete ConfigMaps with old version labels after successful deployment
      kubectl delete configmap -n production \
        -l app=myapp,version!=v${CI_COMMIT_TAG}

    - |
      # Keep only the last 3 ConfigMap versions by timestamp
      kubectl get configmap -n production \
        -l app=myapp \
        --sort-by=.metadata.creationTimestamp \
        -o name | head -n -3 | xargs -r kubectl delete -n production
  only:
    - tags
  when: on_success

Safe Deletion Practices

When cleaning up ConfigMaps and Secrets, follow these safety guidelines:

  1. Dry run first: Always review what will be deleted before executing
  2. Backup before deletion: Export resources to YAML files before removing them
  3. Check age: Only delete resources older than a certain threshold (e.g., 30 days)
  4. Exclude system resources: Skip kube-system, kube-public, and other system namespaces
  5. Monitor for impact: Watch application metrics after cleanup to ensure nothing broke

Example backup and conditional deletion:

# Backup before deletion
kubectl get configmap -n production -o yaml > cm-backup-$(date +%Y%m%d).yaml

# Only delete ConfigMaps older than 30 days
kubectl get configmap -n production -o json | \
  jq -r --arg date "$(date -d '30 days ago' -u +%Y-%m-%dT%H:%M:%SZ)" \
  '.items[] | select(.metadata.creationTimestamp < $date) | .metadata.name' | \
  while read cm; do
    echo "Would delete: $cm (created: $(kubectl get cm $cm -n production -o jsonpath='{.metadata.creationTimestamp}'))"
    # Uncomment to actually delete:
    # kubectl delete configmap $cm -n production
  done

Advanced Patterns for Large-Scale Clusters

For organizations running multiple clusters or large multi-tenant platforms, housekeeping requires more sophisticated approaches.

Policy-Based Cleanup with OPA Gatekeeper

Use OPA Gatekeeper to enforce ConfigMap lifecycle policies at admission time:

apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
  name: configmaprequiredlabels
spec:
  crd:
    spec:
      names:
        kind: ConfigMapRequiredLabels
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package configmaprequiredlabels

        violation[{"msg": msg}] {
          input.review.kind.kind == "ConfigMap"
          not input.review.object.metadata.labels["app"]
          msg := "ConfigMaps must have an 'app' label for lifecycle tracking"
        }

        violation[{"msg": msg}] {
          input.review.kind.kind == "ConfigMap"
          not input.review.object.metadata.labels["owner"]
          msg := "ConfigMaps must have an 'owner' label for lifecycle tracking"
        }

This policy prevents ConfigMaps without proper labels from being created, making future tracking and cleanup much easier.

Centralized Monitoring with Prometheus

Monitor orphaned resource metrics across your clusters:

apiVersion: v1
kind: ConfigMap
metadata:
  name: orphan-detection-exporter
data:
  script.sh: |
    #!/bin/bash
    # Expose metrics for Prometheus scraping
    while true; do
      echo "# HELP k8s_orphaned_configmaps Number of orphaned ConfigMaps"
      echo "# TYPE k8s_orphaned_configmaps gauge"

      for ns in $(kubectl get ns -o jsonpath='{.items[*].metadata.name}'); do
        count=$(./detect-orphaned-configmaps.sh $ns | grep -c "Orphaned:")
        echo "k8s_orphaned_configmaps{namespace=\"$ns\"} $count"
      done

      sleep 300  # Update every 5 minutes
    done

Create alerts when orphaned resource counts exceed thresholds:

groups:
- name: kubernetes-housekeeping
  rules:
  - alert: HighOrphanedConfigMapCount
    expr: k8s_orphaned_configmaps > 20
    for: 24h
    labels:
      severity: warning
    annotations:
      summary: "High number of orphaned ConfigMaps in {{ $labels.namespace }}"
      description: "Namespace {{ $labels.namespace }} has {{ $value }} orphaned ConfigMaps"

Multi-Cluster Cleanup with Crossplane or Cluster API

For platform teams managing dozens or hundreds of clusters, extend cleanup automation across your entire fleet:

# Crossplane Composition for cluster-wide cleanup
apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
  name: cluster-cleanup-policy
spec:
  compositeTypeRef:
    apiVersion: platform.example.com/v1
    kind: ClusterCleanupPolicy
  resources:
    - name: cleanup-cronjob
      base:
        apiVersion: kubernetes.crossplane.io/v1alpha1
        kind: Object
        spec:
          forProvider:
            manifest:
              apiVersion: batch/v1
              kind: CronJob
              # ... CronJob spec from earlier

Housekeeping Checklist for Production Clusters

Here’s a practical checklist to implement sustainable ConfigMap and Secret housekeeping:

Immediate Actions:

  • [ ] Run detection scripts to audit current orphaned resource count
  • [ ] Backup all ConfigMaps and Secrets before any cleanup
  • [ ] Manually review and delete obvious orphans (with team approval)
  • [ ] Document which ConfigMaps/Secrets are intentionally unused but needed

Short-term (1-4 weeks):

  • [ ] Implement consistent labeling standards across teams
  • [ ] Add owner references to all ConfigMaps and Secrets
  • [ ] Deploy scheduled CronJob for automated detection and reporting
  • [ ] Integrate cleanup steps into CI/CD pipelines

Long-term (1-3 months):

  • [ ] Adopt GitOps tooling (ArgoCD, Flux) with automated pruning
  • [ ] Implement OPA Gatekeeper policies for required labels
  • [ ] Set up Prometheus monitoring for orphaned resource metrics
  • [ ] Create runbooks for incident responders
  • [ ] Establish resource quotas per namespace
  • [ ] Conduct quarterly cluster hygiene reviews

Ongoing Practices:

  • [ ] Review orphaned resource reports weekly
  • [ ] Include cleanup tasks in sprint planning
  • [ ] Train new team members on resource lifecycle best practices
  • [ ] Update cleanup automation as cluster architecture evolves

Conclusion

Kubernetes doesn’t automatically clean up orphaned ConfigMaps and Secrets, but with the right strategies, you can prevent them from becoming a problem. The key is implementing a layered approach: use owner references and GitOps for prevention, deploy automated detection for ongoing monitoring, and run scheduled cleanup jobs for maintenance.

Start with detection to understand your current situation, then focus on prevention strategies like owner references and consistent labeling. For existing clusters with accumulated orphaned resources, implement gradual cleanup with proper safety checks rather than aggressive bulk deletion.

Remember that housekeeping isn’t a one-time task—it’s an ongoing operational practice. By building cleanup into your CI/CD pipelines and establishing clear resource ownership, you’ll maintain a clean, secure, and performant Kubernetes environment over time.

The tools and patterns we’ve covered here—from simple bash scripts to sophisticated policy engines—can be adapted to your organization’s scale and maturity level. Whether you’re managing a single cluster or a multi-cluster platform, investing in proper resource lifecycle management pays dividends in operational efficiency, security posture, and team productivity.

Frequently Asked Questions (FAQ)

Can Kubernetes automatically delete unused ConfigMaps and Secrets?

No. Kubernetes does not garbage-collect ConfigMaps or Secrets by default when workloads are deleted. Unless they have ownerReferences set, these resources remain in the cluster indefinitely and must be cleaned up manually or via automation.

Is it safe to delete ConfigMaps or Secrets that are not referenced by running Pods?

Not always. Some resources may be referenced by workloads scaled to zero, CronJobs, or future rollouts. Always perform a dry run, check workload definitions (Deployments, StatefulSets, DaemonSets), and review resource age before deletion.

What is the safest way to prevent orphaned ConfigMaps and Secrets?

The most effective prevention strategies are:
Using ownerReferences (via Helm or Kustomize)
Adopting GitOps with pruning enabled (ArgoCD / Flux)
Applying consistent labeling (app, owner, version)
These ensure unused resources are automatically detected and removed

Which tools are best for detecting orphaned resources?

Popular and reliable tools include:
Kor – purpose-built for detecting unused Kubernetes resources
Popeye – broader cluster hygiene and sanitization reports
Custom scripts/controllers – useful for tailored environments or integrations
For production clusters, Kor provides the best signal-to-noise ratio.

How often should ConfigMap and Secret cleanup run in production?

A common best practice is:
Weekly detection (reporting only)
Monthly cleanup for resources older than a defined threshold (e.g. 30–60 days)
Immediate cleanup integrated into CI/CD after successful deployments
This balances safety with long-term cluster hygiene.

Sources

Kubernetes Gateway API Versions: Complete Compatibility and Upgrade Guide

Kubernetes Gateway API Versions: Complete Compatibility and Upgrade Guide

The Kubernetes Gateway API has rapidly evolved from its experimental roots to become the standard for ingress and service mesh traffic management. But with multiple versions released and various maturity levels, understanding which version to use, how it relates to your Kubernetes cluster, and when to upgrade can be challenging.

In this comprehensive guide, we’ll explore the different Gateway API versions, their relationship to Kubernetes releases, provider support levels, and the upgrade philosophy that will help you make informed decisions for your infrastructure.

Understanding Gateway API Versioning

The Gateway API follows a unique versioning model that differs from standard Kubernetes APIs. Unlike built-in Kubernetes resources that are tied to specific cluster versions, Gateway API CRDs can be installed independently as long as your cluster meets the minimum requirements.

Minimum Kubernetes Version Requirements

As of Gateway API v1.1 and later versions, you need Kubernetes 1.26 or later to run the latest Gateway API releases. The API commits to supporting a minimum of the most recent 5 Kubernetes minor versions, providing a reasonable window for cluster upgrades.

This rolling support window means that if you’re running Kubernetes 1.26, 1.27, 1.28, 1.29, or 1.30, you can safely install and use the latest Gateway API without concerns about compatibility.

Release Channels: Standard vs Experimental

Gateway API uses two distinct release channels to balance stability with innovation. Understanding these channels is critical for choosing the right version for your use case.

Standard Channel

The Standard channel contains only GA (Generally Available, v1) and Beta (v1beta1) level resources and fields. When you install from the Standard channel, you get:

  • Stability guarantees: No breaking changes once a resource reaches Beta or GA
  • Backwards compatibility: Safe to upgrade between minor versions
  • Production readiness: Extensively tested features with multiple implementations
  • Conformance coverage: Full test coverage ensuring portability

Resources in the Standard channel include GatewayClass, Gateway, HTTPRoute, and ReferenceGrant at the v1 level, plus stable features like GRPCRoute.

Experimental Channel

The Experimental channel includes everything from the Standard channel plus Alpha-level resources and experimental fields. This channel is for:

  • Early feature testing: Try new capabilities before they stabilize
  • Cutting-edge functionality: Access the latest Gateway API innovations
  • No stability guarantees: Breaking changes can occur between releases
  • Feature feedback: Help shape the API by testing experimental features

Features may graduate from Experimental to Standard or be dropped entirely based on implementation experience and community feedback.

Gateway API Version History and Features

Let’s explore the major Gateway API releases and what each introduced.

v1.0 (October 2023)

The v1.0 release marked a significant milestone, graduating core resources to GA status. This release included:

  • Gateway, GatewayClass, and HTTPRoute at v1 (stable)
  • Full backwards compatibility guarantees for v1 resources
  • Production-ready status for ingress traffic management
  • Multiple conformant implementations across vendors

v1.1 (May 2024)

Version 1.1 expanded the API significantly with service mesh support:

  • GRPCRoute: Native support for gRPC traffic routing
  • Service mesh capabilities: East-west traffic management alongside north-south
  • Multiple implementations: Both Istio and other service meshes achieved conformance
  • Enhanced features: Additional matching criteria and routing capabilities

This version bridged the gap between traditional ingress controllers and full service mesh implementations.

v1.2 and v1.3

These intermediate releases introduced structured release cycles and additional features:

  • Refined conformance testing
  • BackendTLSPolicy (experimental in v1.3)
  • Enhanced observability and debugging capabilities
  • Improved cross-namespace routing

v1.4 (October 2025)

The latest GA release as of this writing, v1.4.0 brought:

  • Continued API refinement
  • Additional experimental features for community testing
  • Enhanced conformance profiles
  • Improved documentation and migration guides

Kubernetes Version Compatibility Matrix

Here’s how Gateway API versions relate to Kubernetes releases:

Gateway API Version Minimum Kubernetes Recommended Kubernetes Release Date
v1.0.x 1.25 1.26+ October 2023
v1.1.x 1.26 1.27+ May 2024
v1.2.x 1.26 1.28+ 2024
v1.3.x 1.26 1.29+ 2024
v1.4.x 1.26 1.30+ October 2025

The key takeaway: Gateway API v1.1 and later all support Kubernetes 1.26+, meaning you can run the latest Gateway API on any reasonably modern cluster.

Gateway Provider Support Levels

Different Gateway API implementations support various versions and feature sets. Understanding provider support helps you choose the right implementation for your needs.

Conformance Levels

Gateway API defines three conformance levels for features:

  1. Core: Features that must be supported for an implementation to claim conformance. These are portable across all implementations.
  2. Extended: Standardized optional features. Implementations indicate Extended support separately from Core.
  3. Implementation-specific: Vendor-specific features without conformance requirements.

Major Provider Support

Istio

Istio reached Gateway API GA support in version 1.22 (May 2024). Istio provides:

  • Full Standard channel support (v1 resources)
  • Service mesh (east-west) traffic management via GAMMA
  • Ingress (north-south) traffic control
  • Experimental support for BackendTLSPolicy (Istio 1.26+)

Istio is particularly strong for organizations needing both ingress and service mesh capabilities in a single solution.

Envoy Gateway

Envoy Gateway tracks Gateway API releases closely. Version 1.4.0 includes:

  • Gateway API v1.3.0 support
  • Compatibility matrix for Envoy Proxy versions
  • Focus on ingress use cases
  • Strong experimental feature adoption

Check the Envoy Gateway compatibility matrix to ensure your Envoy Proxy version aligns with your Gateway API and Kubernetes versions.

Cilium

Cilium integrates Gateway API deeply with its CNI implementation:

  • Per-node Envoy proxy architecture
  • Network policy enforcement for Gateway traffic
  • Both ingress and service mesh support
  • eBPF-based packet processing

Cilium’s unique architecture makes it a strong choice for organizations already using Cilium for networking.

Contour

Contour v1.31.0 implements Gateway API v1.2.1, supporting:

  • All Standard channel v1 resources
  • Most v1alpha2 resources (TLSRoute, TCPRoute, GRPCRoute)
  • BackendTLSPolicy support

Checking Provider Conformance

To verify which Gateway API version and features your provider supports:

  1. Visit the official implementations page: The Gateway API project maintains a comprehensive list of implementations with their conformance levels.
  2. Check provider documentation: Most providers publish compatibility matrices showing Gateway API, Kubernetes, and proxy version relationships.
  3. Review conformance reports: Providers submit conformance test results that detail exactly which Core and Extended features they support.
  4. Test in non-production: Before upgrading production, validate your specific use cases in a staging environment.

Upgrade Philosophy: When and How to Upgrade

One of the most common questions about Gateway API is: “Do I need to run the latest version?” The answer depends on your specific needs and risk tolerance.

Staying on Older Versions

You don’t need to always run the latest Gateway API version. It’s perfectly acceptable to:

  • Stay on an older stable release if it meets your needs
  • Upgrade only when you need specific new features
  • Wait for your Gateway provider to officially support newer versions
  • Maintain stability over having the latest features

The Standard channel’s backwards compatibility guarantees mean that when you do upgrade, your existing configurations will continue to work.

When to Consider Upgrading

Consider upgrading when:

  1. You need a specific feature: A new HTTPRoute matcher, GRPCRoute support, or other functionality only available in newer versions
  2. Your provider recommends it: Gateway providers often optimize for specific Gateway API versions
  3. Security considerations: While rare, security issues could prompt upgrades
  4. Kubernetes cluster upgrades: When upgrading Kubernetes, verify your Gateway API version is compatible with the new cluster version

Safe Upgrade Practices

Follow these best practices for Gateway API upgrades:

1. Stick with Standard Channel

Using Standard channel CRDs makes upgrades simpler and safer. Experimental features can introduce breaking changes, while Standard features maintain compatibility.

2. Upgrade One Minor Version at a Time

While it’s usually safe to skip versions, the most tested upgrade path is incremental. Going from v1.2 to v1.3 to v1.4 is safer than jumping directly from v1.2 to v1.4.

3. Test Before Upgrading

Always test upgrades in non-production environments:

# Install specific Gateway API version in test cluster
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.4.0/standard-install.yaml

4. Review Release Notes

Each Gateway API release publishes comprehensive release notes detailing:

  • New features and capabilities
  • Graduation of experimental features to standard
  • Deprecation notices
  • Upgrade considerations

5. Check Provider Compatibility

Before upgrading Gateway API CRDs, verify your Gateway provider supports the target version. Installing Gateway API v1.4 won’t help if your controller only supports v1.2.

6. Never Overwrite Different Channels

Implementations should never overwrite Gateway API CRDs that use a different release channel. Keep track of whether you’re using Standard or Experimental channel installations.

CRD Management Best Practices

Gateway API CRD management requires attention to detail:

# Check currently installed Gateway API version
kubectl get crd gateways.gateway.networking.k8s.io -o yaml | grep 'gateway.networking.k8s.io/bundle-version'

# Verify which channel is installed
kubectl get crd gateways.gateway.networking.k8s.io -o yaml | grep 'gateway.networking.k8s.io/channel'

Staying Informed About New Releases

Gateway API releases follow a structured release cycle with clear communication channels.

How to Know When New Versions Are Released

  1. GitHub Releases Page: Watch the kubernetes-sigs/gateway-api repository for release announcements
  2. Kubernetes Blog: Major Gateway API releases are announced on the official Kubernetes blog
  3. Mailing Lists and Slack: Join the Gateway API community channels for discussions and announcements
  4. Provider Announcements: Gateway providers announce support for new Gateway API versions through their own channels

Release Cadence

Gateway API follows a quarterly release schedule for minor versions, with patch releases as needed for bug fixes and security issues. This predictable cadence helps teams plan upgrades.

Practical Decision Framework

Here’s a framework to help you decide which Gateway API version to run:

For New Deployments

  • Production workloads: Use the latest GA version supported by your provider
  • Innovation-focused: Consider Experimental channel if you need cutting-edge features
  • Conservative approach: Use v1.1 or later with Standard channel

For Existing Deployments

  • If things are working: Stay on your current version until you need new features
  • If provider recommends upgrade: Follow provider guidance, especially for security
  • If Kubernetes upgrade planned: Verify compatibility, may need to upgrade Gateway API first or simultaneously

Feature-Driven Upgrades

  • Need service mesh support: Upgrade to v1.1 minimum
  • Need GRPCRoute: Upgrade to v1.1 minimum
  • Need BackendTLSPolicy: Requires v1.3+ and provider support for experimental features

Conclusion

Kubernetes Gateway API represents the future of traffic management in Kubernetes, offering a standardized, extensible, and role-oriented API for both ingress and service mesh use cases. Understanding the versioning model, compatibility requirements, and upgrade philosophy empowers you to make informed decisions that balance innovation with stability.

Key takeaways:

  • Gateway API versions install independently from Kubernetes, requiring only version 1.26 or later for recent releases
  • Standard channel provides stability, Experimental channel provides early access to new features
  • You don’t need to always run the latest version—upgrade when you need specific features
  • Verify provider support before upgrading Gateway API CRDs
  • Follow safe upgrade practices: test first, upgrade incrementally, review release notes

By following these guidelines, you can confidently deploy and maintain Gateway API in your Kubernetes infrastructure while making upgrade decisions that align with your organization’s needs and risk tolerance.

Frequently Asked Questions

What is the difference between Kubernetes Ingress and the Gateway API?

Kubernetes Ingress is a legacy API focused mainly on HTTP(S) traffic with limited extensibility. The Gateway API is its successor, offering a more expressive, role-oriented model that supports multiple protocols, advanced routing, better separation of concerns, and consistent behavior across implementations

Which Gateway API version should I use in production today?

For most production environments, you should use the latest GA (v1.x) release supported by your Gateway provider, installed from the Standard channel. This ensures stability, backwards compatibility, and conformance guarantees while still benefiting from ongoing improvements.

Can I upgrade the Gateway API without upgrading my Kubernetes cluster?

Yes. Gateway API CRDs are installed independently of Kubernetes itself. As long as your cluster meets the minimum supported Kubernetes version (1.26+ for recent releases), you can upgrade the Gateway API without upgrading the cluster.

What happens if my Gateway provider does not support the latest Gateway API version?

If your provider lags behind, you should stay on the latest version officially supported by that provider. Installing newer Gateway API CRDs than your controller supports can lead to missing features or undefined behavior. Provider compatibility should always take precedence over running the newest API version.

Is it safe to upgrade Gateway API CRDs without downtime?

In most cases, yes—when using the Standard channel. The Gateway API provides strong backwards compatibility guarantees for GA and Beta resources. However, you should always test upgrades in a non-production environment and verify that your Gateway provider supports the target version.

Sources

FreeLens vs OpenLens vs Lens: Choosing the Right Kubernetes IDE

FreeLens vs OpenLens vs Lens: Choosing the Right Kubernetes IDE

Introduction: When a Tool Choice Becomes a Legal and Platform Decision

If you’ve been operating Kubernetes clusters for a while, you’ve probably learned this the hard way:
tooling decisions don’t stay “just tooling” for long.

What starts as a developer convenience can quickly turn into:

  • a licensing discussion with Legal,
  • a procurement problem,
  • or a platform standard you’re stuck with for years.

The Kubernetes IDE ecosystem is a textbook example of this.

Many teams adopted Lens because it genuinely improved day-to-day operations. Then the license changed and we already cover the OpenLens vs Lens in the past. Then restrictions appeared. Then forks started to emerge.

Today, the real question is not “Which one looks nicer?” but:

  • Which one is actually maintained?
  • Which one is safe to use in a company?
  • Why is there a fork of a fork?
  • Are they still technically compatible?
  • What is the real switch cost?

Let’s go through this from a production and platform engineering perspective.

The Forking Story: How We Ended Up Here

Understanding the lineage matters because it explains why FreeLens exists at all.

Lens: The Original Product

Lens started as an open-core Kubernetes IDE with a strong community following. Over time, it evolved into a commercial product with:

  • a proprietary license,
  • paid enterprise features,
  • and restrictions on free usage in corporate environments.

This shift was legitimate from a business perspective, but it broke the implicit contract many teams assumed when they standardized on it.

OpenLens: The First Fork

OpenLens was created to preserve:

  • open-source licensing,
  • unrestricted commercial usage,
  • compatibility with Lens extensions.

For a while, OpenLens was the obvious alternative for teams that wanted to stay open-source without losing functionality.

FreeLens: The Fork of the Fork

FreeLens appeared later, and this is where many people raise an eyebrow.

Why fork OpenLens?

Because OpenLens development started to slow down:

  • release cadence became irregular,
  • upstream Kubernetes changes lagged,
  • governance and long-term stewardship became unclear.

FreeLens exists because some contributors were not willing to bet their daily production tooling on a project with uncertain momentum.

This was not ideology. It was operational risk management.

Are the Projects Still Maintained?

Short answer: yes, but not equally.

Lens

  • Actively developed
  • Backed by a commercial vendor
  • Fast adoption of new Kubernetes features

Trade-off:

  • Licensing constraints
  • Paid features
  • Requires legal review in most companies

OpenLens

  • Still maintained
  • Smaller contributor base
  • Slower release velocity

It works, but it no longer feels like a safe long-term default for platform teams.

FreeLens

  • Actively maintained
  • Explicit focus on long-term openness
  • Prioritizes Kubernetes API compatibility and stability

Right now, FreeLens shows the healthiest balance between maintenance and independence.

Technical Compatibility: Can You Switch Without Pain?

This is the good news: yes, mostly.

Cluster Access and Configuration

All three tools:

  • use standard kubeconfig files,
  • support multiple contexts and clusters,
  • work with RBAC, CRDs, and namespaces the same way.

No cluster-side changes are required.

Extensions and Plugins

  • Most Lens extensions work in OpenLens.
  • Most OpenLens extensions work in FreeLens.
  • Proprietary Lens-only extensions are the main exception.

In real-world usage:

  • ~90% of common workflows are identical
  • differences show up only in edge cases or paid features

UX Differences

There are some UI differences:

  • branding,
  • menu structure,
  • feature gating in Lens.

Nothing that requires retraining or documentation updates.

Legal and Licensing Considerations (This Is Where It Usually Breaks)

This is often the decisive factor in enterprise environments.

Lens

  • Requires license compliance checks
  • Free usage may violate internal policies
  • Paid plans required for broader adoption

If you operate in a regulated or audited environment, this alone can be a blocker.

OpenLens

  • Open-source license
  • Generally safe for corporate use
  • Slight uncertainty due to reduced activity

FreeLens

  • Explicitly open-source
  • No usage restrictions
  • Clear intent to remain free for commercial use

If Legal asks, “Can we standardize this across the company?”
FreeLens is the easiest answer.

Which One Should You Use in a Company?

A pragmatic recommendation:

Use Lens if:

  • you want vendor-backed support,
  • you are willing to pay,
  • you already standardized on Mirantis tooling.

Use OpenLens if:

  • you are already using it,
  • it meets your needs today,
  • you accept slower updates.

Use FreeLens if:

  • you want zero licensing risk,
  • you want an open-source default,
  • you care about long-term maintenance,
  • you need something you can standardize safely.

For most platform and DevOps teams, FreeLens is currently the lowest-risk choice.

Switch Cost: How Expensive Is It Really?

Surprisingly low.

Typical migration:

  • install the new binary,
  • reuse existing kubeconfigs,
  • reinstall extensions if needed.

What you don’t need:

  • cluster changes,
  • CI/CD modifications,
  • platform refactoring.

Downtime: none
Rollback: trivial

This is one of the rare cases where switching early is cheap.

Is a “Fork of a Fork” a Red Flag?

Normally, yes.

In this case, no.

FreeLens exists because:

  • maintenance mattered more than branding,
  • openness mattered more than monetization,
  • predictability mattered more than roadmap promises.

Ironically, this is very aligned with how Kubernetes itself evolved.

Conclusion: A Clear, Boring, Production-Safe Answer

If you strip away GitHub drama and branding:

  • Lens optimizes for revenue and enterprise features.
  • OpenLens preserved openness but lost momentum.
  • FreeLens optimizes for sustainability and freedom.

From a platform engineering perspective:

FreeLens is the safest default Kubernetes IDE today for most organizations.

Low switch cost, strong compatibility, no legal surprises.

And in production environments, boring and predictable almost always wins.

SoapUI Maven Integration: Automate API Testing with Maven Builds

SoapUI Maven Integration: Automate API Testing with Maven Builds

SoapUI is a popular open-source tool used for testing SOAP and REST APIs. It comes with a user-friendly interface and a variety of features to help you test API requests and responses. In this article, we will explore how to use SoapUI integrated with Maven for automation testing.

Why Use SoapUI with Maven?

Maven is a popular build automation tool that simplifies building and managing Java projects. It is widely used in the industry, and it has many features that make it an ideal choice for automation testing with SoapUI.

By integrating SoapUI with Maven, you can easily run your SoapUI tests as part of your Maven build process. This will help you to automate your testing process, reduce the time required to test your APIs, and ensure that your tests are always up-to-date.

Setting Up SoapUI and Maven

Before we can start using SoapUI with Maven, we must set up both tools on our system. First, download and install SoapUI from the official website. Once SoapUI is installed, we can proceed with installing Maven.

To install Maven, follow these steps:

  1. Download the latest version of Maven from the official website.
  2. Extract the downloaded file to a directory on your system.
  3. Add the bin directory of the extracted folder to your system’s PATH environment variable.
  4. Verify that Maven is installed by opening a terminal or command prompt and running the command mvn -version.

Creating a Maven Project for SoapUI Tests

Now that we have both SoapUI and Maven installed, we can create a Maven project for our SoapUI tests. To create a new Maven project, follow these steps:

  1. Open a terminal or command prompt and navigate to the directory where you want to create your project.
  2. Run the following command: mvn archetype:generate -DgroupId=com.example -DartifactId=my-soapui-project -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false
  3. This will create a new Maven project with the group ID com.example and the artifact ID my-soapui-project.

Adding SoapUI Tests to the Maven Project

Now that we have a Maven project, we can add our SoapUI tests to the project. To do this, follow these steps:

  1. Create a new SoapUI project by opening SoapUI and selecting File > New SOAP Project.
  2. Follow the prompts to create a new project, including specifying the WSDL file and endpoint for your API.
  3. Once your project is created, create a new test suite and add your test cases.
  4. Save your SoapUI project.

Next, we need to add our SoapUI project to our Maven project. To do this, follow these steps:

  1. In your Maven project directory, create a new directory called src/test/resources.
  2. Copy your SoapUI project file (.xml) to this directory.
  3. In the pom.xml file of your Maven project, add the following code:
<build>
  <plugins>
    <plugin>
      <groupId>com.smartbear.soapui</groupId>
      <artifactId>soapui-maven-plugin</artifactId>
      <version>5.6.0</version>
      <configuration>
        <projectFile>1/src/test/resources/my-soapui-project.xml</projectFile>
        <outputFolder>1/target/surefire-reports</outputFolder>
        <junitReport>true</junitReport>
        <exportwAll>true</exportwAll>
      </configuration>
      <executions>
        <execution>
          <phase>test</phase>
          <goals>
            <goal>test</goal>
          </goals>
        </execution>
      </executions>
    </plugin>
  </plugins>
</build>

This code configures the SoapUI Maven plugin to run our SoapUI tests during the test phase of the Maven build process.

Creating Assertions in SoapUI Projects

Now that we have our SoapUI tests added to our Maven project, we can create assertions to validate the responses of our API calls. To create assertions in SoapUI, follow these steps:

  1. Open your SoapUI project and navigate to the test case where you want to create an assertion.
  2. Right-click on the step that you want to validate and select Add Assertion.
  3. Choose the type of assertion that you want to create (e.g. Contains, XPath Match, Valid HTTP Status Codes, etc.).
  4. Configure the assertion according to your needs.
  5. Save your SoapUI project.
SoapUI Maven Integration: Automate API Testing with Maven Builds

Running SoapUI Tests with Assertions Using Maven

Now that we have our SoapUI tests and assertions added to our Maven project, we can run them using Maven. To run your SoapUI tests with Maven and validate the responses using assertions, follow these steps:

  1. Open a terminal or command prompt and navigate to your Maven project directory.
  2. Run the following command: mvn clean test
  3. This will run your SoapUI tests and generate a report in the target/surefire-reports directory of your Maven project.

During the test execution, if any assertion fails, the test will fail and an error message will be displayed in the console. By creating assertions, we can ensure that our API calls are returning the expected responses.

Conclusion

In this article, we have learned how to use SoapUI integrated with Maven for automation testing, including how to create assertions in SoapUI projects. By using these two tools together, we can automate our testing process, reduce the time required to test our APIs, and ensure that our tests are always up-to-date. If you are looking to get started with automation testing using SoapUI and Maven, give this tutorial a try!

Kubernetes Autoscaling 1.26 Explained: HPA v2 Changes and Impact on KEDA

Kubernetes Autoscaling 1.26 Explained: HPA v2 Changes and Impact on KEDA

Introduction

Kubernetes Autoscaling has suffered a dramatic change. Since the Kubernetes 1.26 release, all components should migrate their HorizontalPodAutoscaler objects from the v1 to the new release v2that has been available since Kubernetes 1.23.

HorizontalPodAutoscaler is a crucial component in any workload deployed on a Kubernetes cluster, as the scalability of this solution is one of the great benefits and key features of this kind of environment.

A little bit of History

Kubernetes has introduced a solution for the autoscaling capability since the version Kubernetes 1.3 a long time ago, in 2016. And the solution was based on a control loop that runs at a specific interval that you can configure with the property --horizontal-pod-autoscaler-sync-period parameters that belong to the kube-controller-manager.

So, once during this period, it will get the metrics and evaluate through the condition defined on the HorizontalPodAutoscaler component. Initially, it was based on the compute resources used by the pod, main memory, and CPU.

Kubernetes Autoscaling 1.26: A Game-Changer for KEDA Users?

This provided an excellent feature, but with the past of time and adoption of the Kubernetes environment, it has been shown as a little narrow to handle all the scenarios that we should have, and here is where other awesome projects we have discussed here, such as KEDA brings into the picture to provide a much more flexible set of features.

Kubernetes AutoScaling Capabilities Introduced v2

With the release of the v2 of the Autoscaling API objects, we have included a range of capabilities to upgrade the flexibility and options available now. There most relevant ones are the following:

  • Scaling on custom metrics: With the new release, you can configure an HorizontalPodAutoscaler object to scale using custom metrics. When we talk about custom metrics, we talk about any metric generated from Kubernetes. You can see a detailed walkthrough about using Custom metrics in the official documentation
  • Scaling on multiple metrics: With the new release, you also have the option to scale based on more than one metric. So now the HorizontalPodAutoscalerwill evaluate each scaling rule condition, propose a new scale value for each of them, and take the maximum value as the final one.
  • Support for Metrics API: With the new release, the controller from the HoriztalPodAutoscaler components retrieves metrics from a series of registered APIs, such as metrics.k8s.io, custom.metrics.k8s.io ,external.metrics.k8s.io. For more information on the different metrics available, you can take a look at the design proposal
  • Configurable Scaling Behavior: With the new release, you have a new field, behavior, that allows configuring how the component will behave in terms of scaling up or scaling down activity. So, you can define different policies for the scaling up and others for the scaling down, limit the max number of replicas that can be added or removed in a specific period, to handle the issues with the spikes of some components as Java workloads, among others. Also, you can define a stabilization window to avoid stress when the metric is still fluctuating.

Kubernetes Autoscaling v2 vs KEDA

We have seen all the new benefits that Autoscaling v2 provides, so I’m sure that most of you are asking the same question: Is Kubernetes Autoscaling v2 killing KEDA?

Since the latest releases of KEDA, KEDA already includes the new objects under the autoscaling/v2 group as part of their development, as KEDA relies on the native objects from Kubernetes, and simplify part of the process you need to do when you want to use custom metric or external ones as they have scalers available for pretty much everything you could need now or even in the future.

But, even with that, there are still features that KEDA provides that are not covered here, such as the scaling “from zero” and “to zero” capabilities that are very relevant for specific kinds of workloads and to get a very optimized use of resources. Still, it’s safe to say that with the new features included in the autoscaling/v2 release, the gap is now smaller. Depending on your needs, you can go with the out-of-the-box capabilities without including a new component in your architecture.

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.