Helm Drivers Explained: Secrets, ConfigMaps, and State Storage in Helm

Helm Drivers Explained: Secrets, ConfigMaps, and State Storage in Helm

When working seriously with Helm in production environments, one of the less-discussed but highly impactful topics is how Helm stores and manages release state. This is where Helm drivers come into play. Understanding Helm drivers is not just an academic exercise; it directly affects security, scalability, troubleshooting, and even disaster recovery strategies.

Understanding Helm drivers is critical for production deployments. This is just one of many essential topics covered in our comprehensive Helm package management guide.

What Helm Drivers Are and How They Are Configured

A Helm driver defines the backend storage mechanism Helm uses to persist release information such as manifests, values, and revision history. Every Helm release has state, and that state must live somewhere. The driver determines where and how this data is stored.

Helm drivers are configured using the HELM_DRIVER environment variable. If the variable is not explicitly set, Helm defaults to using Kubernetes Secrets.

export HELM_DRIVER=secrets

This simple configuration choice can have deep operational consequences, especially in regulated environments or large-scale clusters.

Available Helm Drivers

Secrets Driver (Default)

The secrets driver stores release information as Kubernetes Secrets in the target namespace. This has been the default driver since Helm 3 was introduced.

Secrets are base64-encoded and can be encrypted at rest if Kubernetes encryption at rest is enabled. This makes the driver suitable for clusters with moderate security requirements without additional configuration.

ConfigMaps Driver

The configmaps driver stores Helm release state as Kubernetes ConfigMaps. Functionally, it behaves very similarly to the secrets driver but without any form of implicit confidentiality.

export HELM_DRIVER=configmaps

This driver is often used in development or troubleshooting scenarios where human readability is preferred.

Memory Driver

The memory driver stores release information only in memory. Once the Helm process exits, all state is lost.

export HELM_DRIVER=memory

This driver is rarely used outside of testing, CI pipelines, or ephemeral validation workflows.

Evolution of Helm Drivers

Helm drivers were significantly reworked with the release of Helm 3 in late 2019. Helm 2 relied on Tiller and ConfigMaps by default, which introduced security and operational complexity. Helm 3 removed Tiller entirely and introduced pluggable storage backends with Secrets as the secure default.

Since then, improvements have focused on performance, stability, and better error handling rather than introducing new drivers. The core abstraction has remained intentionally small to avoid fragmentation.

Practical Use Cases and When to Use Each Driver

In production Kubernetes clusters, the secrets driver is almost always the right choice. It integrates naturally with RBAC, supports encryption at rest, and aligns with Kubernetes-native security models.

ConfigMaps can be useful when debugging failed upgrades or learning Helm internals, as the stored data is easier to inspect. However, it should be avoided in environments handling sensitive values.

The memory driver shines in CI/CD pipelines where chart validation or rendering is needed without polluting a cluster with state.

Practical Examples

Switching drivers dynamically can be useful when inspecting a release:

HELM_DRIVER=configmaps helm get manifest my-release

Or running a dry validation in CI:

HELM_DRIVER=memory helm upgrade --install test ./chart --dry-run

Final Thoughts

Helm drivers are rarely discussed, yet they influence how reliable, secure, and observable your Helm workflows are. Treating the choice of driver as a deliberate architectural decision rather than a default setting is one of those small details that differentiate mature DevOps practices from ad-hoc automation.

Helm 4.0 Features, Breaking Changes & Migration Guide 2025

Helm 4.0 Features, Breaking Changes & Migration Guide 2025

Helm is one of the main utilities within the Kubernetes ecosystem, and therefore the release of a new major version, such as Helm 4.0, is something to consider because it is undoubtedly something that will need to be analyzed, evaluated, and managed in the coming months.

Helm 4.0 represents a major milestone in Kubernetes package management. For a complete understanding of Helm from basics to advanced features, explore our .

Due to this, we will see many comments and articles around this topic, so we will try to shed some light.

Helm 4.0 Key Features and Improvements

According to the project itself in its announcement, Helm 4 introduces three major blocks of changes: new plugin system, better integration with Kubernetes ** and internal modernization of SDK and performance**.

New Plugin System (includes WebAssembly)

The plugin system has been completely redesigned, with a special focus on security through the introduction of a new WebAssembly runtime that, while optional, is recommended as it runs in a “sandbox” mode that offers limits and guarantees from a security perspective.

In any case, there is no need to worry excessively, as the “classic” plugins continue to work, but the message is clear: for security and extensibility, the direction is Wasm.

Server-Side Apply and Better Integration with Other Controllers

From this version, Helm 4 supports Server-Side Apply (SSA) through the --server-side flag, which has already become stable since Kubernetes version v1.22 and allows updates on objects to be handled server-side to avoid conflicts between different controllers managing the same resources.

It also incorporates integration with kstatus to ensure the state of a component in a more reliable way than what currently happens with the use of the --wait parameter.

Other Additional Improvements

Additionally, there is another list of improvements that, while of lesser scope, are important qualitative leaps, such as the following:

  • Installation by digest in OCI registries: (helm install myapp oci://...@sha256:<digest>)
  • Multi-document values: you can pass multiple YAML values in a single multi-doc file, facilitating complex environments/overlays.
  • New --set-json argument that allows for easily passing complex structures compared to the current solution using the --set parameter

Why a Major (v4) and Not Another Minor of 3.x?

As explained in the official release post, there were features that the team could not introduce in v3 without breaking public SDK APIs and internal architecture:

  • Strong change in the plugin system (WebAssembly, new types, deep integration with the core).
  • Restructuring of Go packages and establishment of a stable SDK at helm.sh/helm/v4, code-incompatible with v3.
  • Introduction and future evolution of Charts v3, which require the SDK to support multiple versions of chart APIs.

With all this, continuing in the 3.x branch would have violated SemVer: the major number change is basically “paying” the accumulated technical debt to be able to move forward.

Additionally, a new evolution of the charts is expected in the future, moving from v2 to a future v3 that is not yet fully defined, and currently, v2 charts run correctly in this new version.

Is Helm 4.0 Migration Required?

The short answer is: yes. And possibly the long answer is: yes, and quickly. In the official Helm 4 announcement, they specify the support schedule for Helm 3:

  • Helm 3 bug fixes until July 8, 2026.
  • Helm 3 security fixes until November 11, 2026.
  • No new features will be backported to Helm 3 during this period; only Kubernetes client libraries will be updated to support new K8s versions.

Practical translation:

  • Organizations have approximately 1 year to plan a smooth Helm 4.0 migration with continued bug support for Helm 3.
  • After November 2026, continuing to use Helm 3 will become increasingly risky from a security and compatibility standpoint.

Best Practices for Migration

To carry out the migration, it is important to remember that it is perfectly possible and feasible to have both versions installed on the same machine or agent, so a “gradual” migration can be done to ensure that the end of support for version v3 is reached with everything migrated correctly, and for that, the following steps are recommended:

  • Conduct an analysis of all Helm commands and usage from the perspective of integration pipelines, upgrade scripts, or even the import of Helm client libraries in Helm-based developments.
  • Especially carefully review all uses of --post-renderer, helm registry login, --atomic, --force.
  • After the analysis, start testing Helm 4 first in non-production environments, reusing the same charts and values, reverting to Helm 3 if a problem is detected until it is resolved.
  • If you have critical plugins, explicitly test them with Helm 4 before making the global change.

What are the main new features in Helm 4.0?

Helm 4.0 introduces three major improvements: a redesigned plugin system with WebAssembly support for enhanced security, Server-Side Apply (SSA) integration for better conflict resolution, and internal SDK modernization for improved performance. Additional features include OCI digest installation and multi-document values support.

When does Helm 3 support end?

Helm 3 bug fixes end July 8, 2026 and security fixes end November 11, 2026. No new features will be backported to Helm 3. Organizations should plan migration to Helm 4.0 before November 2026 to avoid security and compatibility risks.

Are Helm 3 charts compatible with Helm 4.0?

Yes, Helm Chart API v2 charts work correctly with Helm 4.0. However, the Go SDK has breaking changes, so applications using Helm libraries need code updates. The CLI commands remain largely compatible for most use cases.

Can I run Helm 3 and Helm 4 simultaneously?

Yes, both versions can be installed on the same machine, enabling gradual migration strategies. This allows teams to test Helm 4.0 in non-production environments while maintaining Helm 3 for critical workloads during the transition period.

What should I test before migrating to Helm 4.0?

Focus on testing critical plugins, post-renderers, and specific flags like --atomic, --force, and helm registry login. Test all charts and values in non-production environments first, and review any custom integrations using Helm SDK libraries.

What is Server-Side Apply in Helm 4.0?

Server-Side Apply (SSA) is enabled with the --server-side flag and handles resource updates on the Kubernetes API server side. This prevents conflicts between different controllers managing the same resources and has been stable since Kubernetes v1.22.

Resolving Kubernetes Ingress Issues: Limitations and Gateway Insights

Resolving Kubernetes Ingress Issues: Limitations and Gateway Insights

Introduction

Ingresses have been, since the early versions of Kubernetes, the most common way to expose applications to the outside. Although their initial design was simple and elegant, the success of Kubernetes and the growing complexity of use cases have turned Ingress into a problematic piece: limited, inconsistent between vendors, and difficult to govern in enterprise environments.

In this article, we analyze why Ingresses have become a constant source of friction, how different Ingress Controllers have influenced this situation, and why more and more organizations are considering alternatives like Gateway API.

What Ingresses are and why they were designed this way

The Ingress ecosystem revolves around two main resources:

🏷️ IngressClass

Defines which controller will manage the associated Ingresses. Its scope is cluster-wide, so it is usually managed by the platform team.

🌐 Ingress

It is the resource that developers use to expose a service. It allows defining routes, domains, TLS certificates, and little more.

Its specification is minimal by design, which allowed for rapid adoption, but also laid the foundation for current problems.

The problem: a standard too simple for complex needs

As Kubernetes became an enterprise standard, users wanted to replicate advanced configurations of traditional proxies: rewrites, timeouts, custom headers, CORS, etc.
But Ingress did not provide native support for all this.

Vendors reacted… and chaos was born.

Annotations vs CRDs: two incompatible paths

Different Ingress Controllers have taken very different paths to add advanced capabilities:

📝 Annotations (NGINX, HAProxy…)

Advantages:

  • Flexible and easy to use
  • Directly in the Ingress resource

Disadvantages:

  • Hundreds of proprietary annotations
  • Fragmented documentation
  • Non-portable configurations between vendors

📦 Custom CRDs (Traefik, Kong…)

Advantages:

  • More structured and powerful
  • Better validation and control

Disadvantages:

  • Adds new non-standard objects
  • Requires installation and management
  • Less interoperability

Result?
Infrastructures deeply coupled to a vendor, complicating migrations, audits, and automation.

The complexity for development teams

The design of Ingress implies two very different responsibilities:

  • Platform: defines IngressClass
  • Application: defines Ingress

But the reality is that the developer ends up making decisions that should be the responsibility of the platform area:

  • Certificates
  • Security policies
  • Rewrite rules
  • CORS
  • Timeouts
  • Corporate naming practices

This causes:

  • Inconsistent configurations
  • Bottlenecks in reviews
  • Constant dependency between teams
  • Lack of effective standardization

In large companies, where security and governance are critical, this is especially problematic.

NGINX Ingress: the decommissioning that reignited the debate

The recent decommissioning of the NGINX Ingress Controller has highlighted the fragility of the ecosystem:

  • Thousands of clusters depend on it
  • Multiple projects use its annotations
  • Migrating involves rewriting entire configurations

This has reignited the conversation about the need for a real standard… and there appears Gateway API.

Gateway API: a promising alternative (but not perfect)

Gateway API was born to solve many of the limitations of Ingress:

  • Clear separation of responsibilities (infrastructure vs application)
  • Standardized extensibility
  • More types of routes (HTTPRoute, TCPRoute…)
  • Greater expressiveness without relying on proprietary annotations

But it also brings challenges:

  • Requires gradual adoption
  • Not all vendors implement the same
  • Migration is not trivial

Even so, it is shaping up to be the future of traffic management in Kubernetes.

Conclusion

Ingresses have been fundamental to the success of Kubernetes, but their own simplicity has led them to become a bottleneck. The lack of interoperability, differences between vendors, and complex governance in enterprise environments make it clear that it is time to adopt more mature models.

Gateway API is not perfect, but it moves in the right direction.
Organizations that want future stability should start planning their transition.

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.

Kubernetes Node Affinity Explained: Scheduling Rules, Trade-offs & Best Practices

What is Kubernetes Node Affinity? Benefits and Core Concepts

Kubernetes node affinity is an essential scheduling feature that allows you to control pod placement based on node labels and properties. By using node affinity rules, you can specify constraints on which nodes pods can be scheduled, enabling you to optimize resource allocation and enhance performance.

Node affinity works by allowing you to define rules for pod scheduling based on node labels. When defining node affinity rules, you have two options: required and preferred rules. Required rules ensure that pods are scheduled only on nodes that satisfy the defined criteria. If no suitable node is available, the pod remains unscheduled. On the other hand, preferred rules provide a soft constraint and attempt to schedule pods on nodes that match the specified criteria. However, if no such node is available, the pod can still be scheduled on other nodes.

Node affinity rules are an “expanded” option of the simply way by using node selectors. Node selectors are a simple form of node affinity that allows you to assign labels to nodes and match those labels with selectors defined in the pod specification. By specifying a node selector, you can ensure that pods are scheduled only on nodes with matching labels. Node selectors are useful for basic affinity requirements but lack the flexibility and fine-grained control provided by more advanced affinity options.

Node Affinity Trade-offs: Required vs Preferred Rules and Failure Scenarios

But this awesome capability has some trade-offs that you need to take in consideration because nothing comes with a price that you need to be aware of, so, let’s go to the important question, what is the worst case scenario of using any of those options?

Consider a stateful workload, like a distributed database (e.g., etcd or ZooKeeper), deployed with three replicas for consensus and fault tolerance. So you decide to define a set of nodes for this workload and use node affinity rules to ensure the pods are scheduled to those nodes. And, you need to think: should I use the preferred mode or the requiredMode?

Let’s say that you go with the required option and you define it like this, what happen if one of your nodes goes down? The pod will be try to be rescheduled again and unless there are another node “with same label” to that, it cannot be deployed? If you additional defined a pod anti-affinity rule to ensure each of the replicas is in a different host to ensure that in case that one node is going down you lose only a single replica, you’re losing the option to rescheudle the workload even if you have another nodes without the label available. So, you’re not in a so reliable option.

Ok, so you go with the preferred to ensure that you workload is for sure scheduled even if it is in another node, and in that case you can end up on the situation that those nodes are scheduled on other nodes keeping those nodes with the proper label without the workload that they should have, making the situation strange and more difficult to administer because you cannot ensure your workloads is on the nodes that you expected to be.

Additional to that, if the nodes has even taints to ensure other workloads cannot be placed there, you can end up in a situation that the “labeled-pods” are scheduled on non-labeled nodes, and the non-labeled pods cannot use the nodes because they’re tainted and can be not be able to use the un-labeled ones if there are not enough resources. So you’re generating an impact on the other workloasd and potentially affecting the schedulling of the other workloads.

 Preparing for Unexpected Outages with Node Affinity

So, as you can see, each decision has some disadvatanges that you need to take in consdieration before defining those rules, because if you don’t, you will figure it out when this happen on an production enviornment probably as a result of some unexpected outage, because we all know that in the meantime that nothing bad happens everything works as expected, but the potential of these solutions and its reason to be used is exactly to provide the tools and the options to be prepared when bad things happens.

So, next time that you need to define a node affinity rule try to think about the disadvantages of each of the option and try to select that one that works best for you and mitigate the problems that it can bring to the table of your production environment.

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.

Frequently Asked Questions

What is the difference between nodeSelector and node affinity in Kubernetes?

nodeSelector is a simple field that requires a node to have all specified labels. Node affinity is a more expressive API that supports complex operators like In, NotIn, and Exists, and distinguishes between hard (requiredDuringScheduling...) and soft (preferredDuringScheduling...) constraints. Use nodeSelector for basic needs; use node affinity for advanced scheduling logic.

When should I use required vs preferred node affinity rules?

Use required rules for strict placement needs, like licensing constraints or specific hardware (e.g., GPU nodes). Use preferred rules for optimization, like trying to place pods on nodes in the same availability zone for lower latency. Be aware that required rules can prevent scheduling during node failures, while preferred rules may not guarantee optimal placement.

What are the risks of using required node affinity?

The primary risk is scheduling failure. If no node matches the required rules (e.g., due to a failure or label mismatch), the pod will remain Pending. This can lead to application downtime, especially if combined with Pod Anti-Affinity, which further restricts eligible nodes. Always ensure you have enough labeled nodes to handle failures.

How does node affinity interact with taints and tolerations?

They work sequentially. First, the scheduler filters nodes based on node affinity/selector rules. Then, from the filtered nodes, it checks taints and tolerations. A pod will only be scheduled on a node that satisfies both its affinity/selector requirements and for which the pod has a matching toleration for all the node’s taints.

What are best practices for defining node affinity labels?

Use clear, descriptive label keys (e.g., node.kubernetes.io/instance-type, topology.kubernetes.io/zone). Prefer built-in labels where possible. Document the purpose of custom labels. Combine node affinity with pod anti-affinity carefully to avoid over-constraining the scheduler. Test scenarios with node failures.

Integrate Kyverno CLI into CI/CD Pipelines with GitHub Actions for Kubernetes Policy Checks

Integrate Kyverno CLI into CI/CD Pipelines with GitHub Actions for Kubernetes Policy Checks

Introduction

As Kubernetes clusters become an integral part of infrastructure, maintaining compliance with security and configuration policies is crucial. Kyverno, a policy engine designed for Kubernetes, can be integrated into your CI/CD pipelines to enforce configuration standards and automate policy checks. In this article, we’ll walk through integrating Kyverno CLI with GitHub Actions, providing a seamless workflow for validating Kubernetes manifests before they reach your cluster.

What is Kyverno CLI?

Kyverno is a Kubernetes-native policy management tool, enabling users to enforce best practices, security protocols, and compliance across clusters. Kyverno CLI is a command-line interface that lets you apply, test, and validate policies against YAML manifests locally or in CI/CD pipelines. By integrating Kyverno CLI with GitHub Actions, you can automate these policy checks, ensuring code quality and compliance before deploying resources to Kubernetes.

Benefits of Using Kyverno CLI in CI/CD Pipelines

Integrating Kyverno into your CI/CD workflow provides several advantages:

  1. Automated Policy Validation: Detect policy violations early in the CI/CD pipeline, preventing misconfigured resources from deployment.
  2. Enhanced Security Compliance: Kyverno enables checks for security best practices and compliance frameworks.
  3. Faster Development: Early feedback on policy violations streamlines the process, allowing developers to fix issues promptly.

Setting Up Kyverno CLI in GitHub Actions

Step 1: Install Kyverno CLI

To use Kyverno in your pipeline, you need to install the Kyverno CLI in your GitHub Actions workflow. You can specify the Kyverno version required for your project or use the latest version.

Here’s a sample GitHub Actions YAML configuration to install Kyverno CLI:

name: CI Pipeline with Kyverno Policy Checks

on:
  push:
    branches:
      - main
  pull_request:
    branches:
      - main

jobs:
  kyverno-policy-check:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout Code
        uses: actions/checkout@v2

      - name: Install Kyverno CLI
        run: |
          curl -LO https://github.com/kyverno/kyverno/releases/download/v<version>/kyverno-cli-linux.tar.gz
          tar -xzf kyverno-cli-linux.tar.gz
          sudo mv kyverno /usr/local/bin/

Replace <version> with the version of Kyverno CLI you wish to use. Alternatively, you can replace it with latest to always fetch the latest release.

Step 2: Define Policies for Validation

Create a directory in your repository to store Kyverno policies. These policies define the standards that your Kubernetes resources should comply with. For example, create a directory structure as follows:

.
└── .github
    └── policies
        ├── disallow-latest-tag.yaml
        └── require-requests-limits.yaml

Each policy is defined in YAML format and can be customized to meet specific requirements. Below are examples of policies that might be used:

  • Disallow latest Tag in Images: Prevents the use of the latest tag to ensure version consistency.
  • Enforce CPU/Memory Limits: Ensures resource limits are set for containers, which can prevent resource abuse.

Step 3: Add a GitHub Actions Step to Validate Manifests

In this step, you’ll use Kyverno CLI to validate Kubernetes manifests against the policies defined in the .github/policies directory. If a manifest fails validation, the pipeline will halt, preventing non-compliant resources from being deployed.

Here’s the YAML configuration to validate manifests:

- name: Validate Kubernetes Manifests
  run: |
    kyverno apply .github/policies -r manifests/

Replace manifests/ with the path to your Kubernetes manifests in the repository. This command applies all policies in .github/policies against each YAML file in the manifests directory, stopping the pipeline if any non-compliant configurations are detected.

Step 4: Handle Validation Results

To make the output of Kyverno CLI more readable, you can use additional GitHub Actions steps to format and handle the results. For instance, you might set up a conditional step to notify the team if any manifest is non-compliant:

- name: Check for Policy Violations
  if: failure()
  run: echo "Policy violation detected. Please review the failed validation."

Alternatively, you could configure notifications to alert your team through Slack, email, or other integrations whenever a policy violation is identified.

Example: Validating a Kubernetes Manifest

Suppose you have a manifest defining a Kubernetes deployment as follows:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
        - name: nginx
          image: nginx:latest  # Should trigger a violation

The policy disallow-latest-tag.yaml checks if any container image uses the latest tag and rejects it. When this manifest is processed, Kyverno CLI flags the image and halts the CI/CD pipeline with an error, preventing the deployment of this manifest until corrected.

Conclusion

Integrating Kyverno CLI into a GitHub Actions CI/CD pipeline offers a robust, automated solution for enforcing Kubernetes policies. With this setup, you can ensure Kubernetes resources are compliant with best practices and security standards before they reach production, enhancing the stability and security of your deployments.

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.

Kubernetes Ingress on OpenShift: Routes Explained and When to Use Them

Kubernetes Ingress on OpenShift: Routes Explained and When to Use Them

Introduction
OpenShift, Red Hat’s Kubernetes platform, has its own way of exposing services to external clients. In vanilla Kubernetes, you would typically use an Ingress resource along with an ingress controller to route external traffic to services. OpenShift, however, introduced the concept of a Route and an integrated Router (built on HAProxy) early on, before Kubernetes Ingress even existed. Today, OpenShift supports both Routes and standard Ingress objects, which can sometimes lead to confusion about when to use each and how they relate.

This article explores how OpenShift handles Kubernetes Ingress resources, how they translate to Routes, the limitations of this approach, and guidance on when to use Ingress versus Routes.

OpenShift Routes and the Router: A Quick Overview


OpenShift Routes are OpenShift-specific resources designed to expose services externally. They are served by the OpenShift Router, which is an HAProxy-based proxy running inside the cluster. Routes support advanced features such as:

  • Weighted backends for traffic splitting
  • Sticky sessions (session affinity)
  • Multiple TLS termination modes (edge, passthrough, re-encrypt)
  • Wildcard subdomains
  • Custom certificates and SNI
  • Path-based routing

Because Routes are OpenShift-native, the Router understands these features natively and can be configured accordingly. This tight integration enables powerful and flexible routing capabilities tailored to OpenShift environments.

Using Kubernetes Ingress in OpenShift (Default Behavior)


Starting with OpenShift Container Platform (OCP) 3.10, Kubernetes Ingress resources are supported. When you create an Ingress, OpenShift automatically translates it into an equivalent Route behind the scenes. This means you can use standard Kubernetes Ingress manifests, and OpenShift will handle exposing your services externally by creating Routes accordingly.

Example: Kubernetes Ingress and Resulting Route

Here is a simple Ingress manifest:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: example-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
  - host: www.example.com
    http:
      paths:
      - path: /testpath
        pathType: Prefix
        backend:
          service:
            name: test-service
            port:
              number: 80

OpenShift will create a Route similar to:

apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: example-route
spec:
  host: www.example.com
  path: /testpath
  to:
    kind: Service
    name: test-service
    weight: 100
  port:
    targetPort: 80
  tls:
    termination: edge

This automatic translation simplifies migration and supports basic use cases without requiring Route-specific manifests.

Tuning Behavior with Annotations (Ingress ➝ Route)

When you use Ingress on OpenShift, only OpenShift-aware annotations are honored during the Ingress ➝ Route translation. Controller-specific annotations for other ingress controllers (e.g., nginx.ingress.kubernetes.io/*) are ignored by the OpenShift Router. The following annotations are commonly used and supported by the OpenShift router to tweak the generated Route:

Purpose Annotation Typical Values Effect on Generated Route
TLS termination route.openshift.io/termination edge · reencrypt · passthrough Sets Route spec.tls.termination to the chosen mode.
HTTP→HTTPS redirect (edge) route.openshift.io/insecureEdgeTerminationPolicy Redirect · Allow · None Controls spec.tls.insecureEdgeTerminationPolicy (commonly Redirect).
Backend load-balancing haproxy.router.openshift.io/balance roundrobin · leastconn · source Sets HAProxy balancing algorithm for the Route.
Per-route timeout haproxy.router.openshift.io/timeout duration like 60s, 5m Configures HAProxy timeout for requests on that Route.
HSTS header haproxy.router.openshift.io/hsts_header e.g. max-age=31536000;includeSubDomains;preload Injects HSTS header on responses (edge/re-encrypt).

Note: Advanced features like weighted backends/canary or wildcard hosts are not expressible via standard Ingress. Use a Route directly for those.

Example: Ingress with OpenShift router annotations

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: example-ingress-https
  annotations:
    route.openshift.io/termination: edge
    route.openshift.io/insecureEdgeTerminationPolicy: Redirect
    haproxy.router.openshift.io/balance: leastconn
    haproxy.router.openshift.io/timeout: 60s
    haproxy.router.openshift.io/hsts_header: max-age=31536000;includeSubDomains;preload
spec:
  rules:
  - host: www.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: test-service
            port:
              number: 80

This Ingress will be realized as a Route with edge TLS and an automatic HTTP→HTTPS redirect, using least connections balancing and a 60s route timeout. The HSTS header will be added by the router on HTTPS responses.

Limitations of Using Ingress to Generate Routes
While convenient, using Ingress to generate Routes has limitations:

  • Missing advanced features: Weighted backends and sticky sessions require Route-specific annotations and are not supported via Ingress.
  • TLS passthrough and re-encrypt modes: These require OpenShift-specific annotations on Routes and are not supported through standard Ingress.
  • Ingress without host: An Ingress without a hostname will not create a Route; Routes require a host.
  • Wildcard hosts: Wildcard hosts (e.g., *.example.com) are only supported via Routes, not Ingress.
  • Annotation compatibility: Some OpenShift Route annotations do not have equivalents in Ingress, leading to configuration gaps.
  • Protocol support: Ingress supports only HTTP/HTTPS protocols, while Routes can handle non-HTTP protocols with passthrough TLS.
  • Config drift risk: Because Routes created from Ingress are managed by OpenShift, manual edits to the generated Route may be overwritten or cause inconsistencies.

These limitations mean that for advanced routing configurations or OpenShift-specific features, using Routes directly is preferable.

When to Use Ingress vs. When to Use Routes
Choosing between Ingress and Routes depends on your requirements:

  • Use Ingress if:
  • You want portability across Kubernetes platforms.
  • You have existing Ingress manifests and want to minimize changes.
  • Your application uses only basic HTTP or HTTPS routing.
  • You prefer platform-neutral manifests for CI/CD pipelines.
  • Use Routes if:
  • You need advanced routing features like weighted backends, sticky sessions, or multiple TLS termination modes.
  • Your deployment is OpenShift-specific and can leverage OpenShift-native features.
  • You require stability and full support for OpenShift routing capabilities.
  • You need to expose non-HTTP protocols or use TLS passthrough/re-encrypt modes.
  • You want to use wildcard hosts or custom annotations not supported by Ingress.

In many cases, teams use a combination: Ingress for portability and Routes for advanced or OpenShift-specific needs.

Conclusion


On OpenShift, Kubernetes Ingress resources are automatically converted into Routes, enabling basic external service exposure with minimal effort. This allows users to leverage existing Kubernetes manifests and maintain portability. However, for advanced routing scenarios and to fully utilize OpenShift’s powerful Router features, using Routes directly is recommended.

Both Ingress and Routes coexist seamlessly on OpenShift, allowing you to choose the right tool for your application’s requirements.

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.

Talos Linux: The Immutable, API-Driven OS for Kubernetes (Deep Dive)

Talos Linux: The Immutable, API-Driven OS for Kubernetes (Deep Dive)

Every Kubernetes cluster runs on Linux. But the distribution you choose for your nodes determines how much time you spend patching, hardening, debugging SSH sessions, and dealing with configuration drift across your fleet. General-purpose distributions like Ubuntu and Debian were designed to run anything: web servers, desktops, databases, and yes, Kubernetes. That flexibility is also their biggest liability when your only job is running containers.

Talos Linux takes a radically different approach. It strips away everything a Kubernetes node does not need: there is no shell, no SSH daemon, no package manager, and no way to log in interactively. The entire operating system is managed through an API, and every change is declarative. If that sounds extreme, it is. But it solves real problems that traditional distributions cannot address without layers of additional tooling.

This guide is a comprehensive deep dive into Talos Linux: what it is, how its architecture works, how it compares to alternatives like Flatcar and Bottlerocket, how to install and operate it, and when you should (and should not) use it. Whether you are evaluating Talos for a production fleet or a homelab, this is everything you need to make an informed decision.

What Is Talos Linux

Talos Linux is a minimal, immutable operating system designed exclusively to run Kubernetes. It is developed by Sidero Labs and distributed as a single system image that boots into a Kubernetes-ready state. There is no general-purpose userland. No bash shell. No ability to SSH into a node and run commands. Every aspect of machine configuration — from network settings to Kubernetes component flags — is expressed in a YAML document called the machine config and applied through an authenticated gRPC API.

The core design principles are:

  • Immutable — The root filesystem is read-only and mounted from a SquashFS image. You cannot install packages, modify system binaries, or alter the OS at runtime.
  • API-driven — All management happens through talosctl, a CLI that communicates with the Talos API over mutual TLS. There is no SSH and no interactive console.
  • Minimal — The OS ships only what Kubernetes needs: a Linux kernel, containerd, the kubelet, etcd (on control plane nodes), and the Talos machinery. The installed image is roughly 80 MB.
  • Declarative — The desired machine state is defined in a YAML config. Applying a new config converges the node to the desired state, similar to how Kubernetes reconciles workloads.
  • Secure by default — No shell access means no attack vector through compromised credentials. All API communication requires mutual TLS authentication. The attack surface is drastically smaller than any traditional distribution.

Talos supports bare metal, VMware vSphere, AWS, Azure, GCP, Hetzner, Equinix Metal, Oracle Cloud, and several other platforms. It also runs on single-board computers like Raspberry Pi and NVIDIA Jetson, making it viable for edge deployments. For a broader perspective on how immutable infrastructure fits into the Kubernetes ecosystem, see our Kubernetes security best practices guide.

Architecture Deep Dive

Understanding Talos at an architectural level is essential before deploying it. The design choices are unconventional compared to what most Linux administrators expect, and they explain both its strengths and its constraints.

The machined Daemon and API-Driven Management

At the heart of Talos is machined, a single PID-1 process that replaces systemd, init, and every other service manager. When a Talos node boots, machined starts, reads its machine configuration, and orchestrates the entire lifecycle: networking, disk setup, containerd, the kubelet, and etcd (on control plane nodes).

machined exposes a gRPC API over port 50000 (for the trustd/machine API) and port 50001 (for the maintenance API during initial provisioning). This is the only way to interact with the node. The talosctl CLI is the primary client, authenticating with mutual TLS certificates generated during cluster bootstrapping.

Key API operations include:

  • talosctl apply-config — Push a new or updated machine configuration.
  • talosctl upgrade — Trigger an in-place OS upgrade.
  • talosctl dmesg — Stream kernel messages in real time.
  • talosctl logs — Read logs from any Talos service (etcd, kubelet, containerd).
  • talosctl get — Inspect resource state (network interfaces, disks, services).
  • talosctl reset — Wipe a node and return it to maintenance mode.

This API-first model eliminates configuration drift by design. There is no way for an operator to SSH into a node, run an ad-hoc command, and leave the system in an undocumented state. Every change flows through the same declarative path.

System Partitions Layout

Talos partitions the disk into a well-defined layout that separates immutable system data from mutable state:

PartitionPurposeMutable
EFIEFI System Partition for UEFI bootNo
BIOSBIOS boot partition (legacy boot)No
BOOTContains the kernel and initramfsNo (replaced during upgrades)
METAStores metadata like machine UUID and upgrade statusLimited
STATEHolds the machine configuration and PKI materialYes (managed by machined)
EPHEMERALMounted at /var, stores containerd images, kubelet data, etcd data, and pod logsYes (wiped on reset)

The STATE partition is critical: it persists the machine config and TLS certificates across reboots and upgrades. The EPHEMERAL partition holds everything that can be reconstructed — container images, pod volumes (emptyDir), and etcd data on control plane nodes. When you run talosctl reset, the EPHEMERAL partition is wiped, but STATE can optionally be preserved.

This layout means that an OS upgrade replaces the BOOT partition contents (kernel + initramfs) while leaving your machine configuration and Kubernetes state untouched. If an upgrade fails, Talos rolls back to the previous BOOT image automatically.

Boot Process and Kubernetes Bootstrapping

The Talos boot sequence is deterministic and fast, typically completing in under 60 seconds on modern hardware:

  1. Firmware → Bootloader — UEFI or BIOS loads GRUB, which loads the Talos kernel and initramfs.
  2. Kernel init → machined — The kernel starts machined as PID 1. There is no init system in between.
  3. Machine config discoverymachined checks the STATE partition for an existing config. If none is found (first boot), it enters maintenance mode and listens on the maintenance API for a config to be applied.
  4. Network configuration — Networking is brought up based on the machine config (DHCP or static).
  5. Disk setup — Partitions are created or validated. The EPHEMERAL partition is formatted if missing.
  6. containerd starts — The container runtime is launched.
  7. etcd starts (control plane only) — etcd is started and joins the existing cluster, or waits for a bootstrap command.
  8. kubelet starts — The kubelet registers the node with the Kubernetes API server.

The first control plane node requires a one-time bootstrap command (talosctl bootstrap) to initialize the etcd cluster and generate the Kubernetes control plane static pods. Subsequent control plane nodes join automatically.

Security Model: No SSH, Mutual TLS, API-Only

Talos Linux implements a zero-trust security model at the OS level. Every API request is authenticated using mutual TLS (mTLS). When you generate a cluster configuration with talosctl gen config, it produces a Certificate Authority (CA) that signs both the client (operator) and server (node) certificates.

The security implications are significant:

  • No shell access — There is no /bin/sh, no /bin/bash, no login capability. Even if an attacker gains network access to the node, there is no shell to exploit.
  • No SSH daemon — Port 22 is not open. There is no sshd binary on the system.
  • No package manager — You cannot install tools, backdoors, or persistence mechanisms on the host.
  • Read-only rootfs — Even with theoretical root access, the filesystem cannot be modified.
  • Mutual TLS everywhere — The Talos API, etcd communication, and inter-node trust all use mTLS. Certificates can be rotated without downtime.

This does not make Talos invulnerable — kernel exploits and container escape vulnerabilities still apply. But it eliminates the most common attack vectors in Kubernetes node compromise: SSH credential theft, unauthorized package installation, and persistent rootkits.

Talos Linux vs Alternatives: Comparison Table

Choosing a node OS depends on your operational model, cloud provider, and team experience. Here is how Talos Linux compares to the most common alternatives for Kubernetes node operating systems.

FeatureTalos LinuxUbuntu / DebianFlatcar Container LinuxBottlerocket (AWS)RancherOS / k3OS
MutabilityFully immutable rootfsFully mutableImmutable rootfs, writable /etcImmutable rootfsMostly immutable
SSH AccessNone (no sshd)Yes (default)Yes (default)Optional (admin container)Yes
Shell AccessNoneFull shellFull shellLimited (via admin container)Full shell
Management ModelDeclarative API (gRPC)Imperative (apt, SSH)Declarative (Ignition) + SSHDeclarative (TOML settings API)cloud-init + SSH
Update MechanismA/B image swap with rollbackapt upgrade (in-place)A/B image swap (Nebraska/FLUO)A/B image swapImage swap
Container Runtimecontainerdcontainerd or CRI-Ocontainerd (Docker optional)containerdDocker (RancherOS), containerd (k3OS)
Kubernetes IntegrationBuilt-in (kubelet, etcd bundled)Manual (kubeadm, etc.)Manual (kubeadm, etc.)EKS-optimizedBuilt-in (k3s bundled)
Cloud SupportAWS, Azure, GCP, Hetzner, bare metal, VMware, and moreAll cloudsAWS, Azure, GCP, bare metal, VMwareAWS onlyLimited
Image Size~80 MB~1-2 GB~300 MB~200 MB~150 MB
Config DriftImpossible (API-only)Common without toolingPossible (SSH access)Low (API + limited shell)Possible

Talos Linux vs Ubuntu / Debian

Ubuntu and Debian are the default choices for most Kubernetes deployments, especially when using kubeadm or managed installers. They work. But they carry everything a general-purpose OS includes: a package manager, a full shell, hundreds of system services, and thousands of binaries that your Kubernetes nodes never use.

The operational burden is real: you need to patch the OS independently from Kubernetes, harden SSH, configure unattended upgrades, manage user accounts, and run CIS benchmarks to verify compliance. With Talos, these concerns disappear because the attack surface simply does not exist. The trade-off is that you lose the ability to SSH in and debug problems the traditional way.

Talos Linux vs Flatcar Container Linux

Flatcar Container Linux (the successor to CoreOS Container Linux) is the closest philosophical match to Talos. Both use immutable root filesystems and image-based updates. However, Flatcar retains SSH access and a full shell, which means an operator can still log in and make ad-hoc changes. Flatcar uses Ignition for initial provisioning and systemd for service management.

The key difference is that Flatcar is a container-optimized general-purpose OS, while Talos is a Kubernetes-only OS. Flatcar can run arbitrary containers and system services. Talos runs only Kubernetes. If you need SSH as a safety net during your transition to immutable infrastructure, Flatcar is a pragmatic middle ground. If you want to enforce immutability with no escape hatches, Talos is the stronger choice.

Talos Linux vs Bottlerocket

Bottlerocket is AWS’s purpose-built container OS, designed for EKS and ECS. Like Talos, it has an immutable rootfs and an API-driven settings model. Unlike Talos, it provides an optional “admin container” that gives you a shell for debugging, and it is heavily optimized for the AWS ecosystem.

If you run exclusively on AWS with EKS, Bottlerocket is the path of least resistance. If you need a multi-cloud or bare-metal solution with integrated Kubernetes bootstrapping, Talos is significantly more flexible. Bottlerocket also does not bootstrap Kubernetes itself — it relies on EKS or an external installer.

Talos Linux vs RancherOS / k3OS

RancherOS and k3OS were early attempts at minimal container-focused Linux distributions. RancherOS ran the entire system as Docker containers. k3OS bundled k3s (lightweight Kubernetes) into the OS. Both projects have been deprecated or are in maintenance mode. Talos is the actively developed, production-grade successor to this category. If you are currently running k3OS, Talos is the natural migration path.

Installation and Cluster Bootstrap

Setting up a Talos cluster follows a consistent workflow regardless of the platform: generate configs, boot nodes, apply configs, bootstrap. Here is a step-by-step walkthrough.

Step 1: Install talosctl

Download the talosctl binary for your platform. On macOS with Homebrew:

brew install siderolabs/tap/talosctl

On Linux:

curl -sL https://talos.dev/install | sh

Step 2: Generate Machine Configurations

The talosctl gen config command generates a full set of machine configurations: one for control plane nodes, one for workers, and a talosconfig file containing the client credentials.

talosctl gen config my-cluster https://10.0.0.10:6443 \
  --output-dir _out

This creates three files in the _out directory:

  • controlplane.yaml — Machine config for control plane nodes.
  • worker.yaml — Machine config for worker nodes.
  • talosconfig — Client configuration with the CA certificate and client key for mTLS authentication.

The endpoint URL (https://10.0.0.10:6443) should point to the Kubernetes API server address — either a load balancer VIP or the IP of your first control plane node.

Step 3: Boot Nodes with Talos

How you boot depends on the platform:

  • Bare metal — Write the Talos ISO or disk image to a USB drive or PXE boot. The node boots into maintenance mode, waiting for a config.
  • VMware — Deploy the OVA template, or use the ISO in a VM. Talos provides official OVA images.
  • AWS — Use the official Talos AMI. Launch EC2 instances with the AMI and pass the machine config as user-data.
  • Azure / GCP — Use the official images from Sidero Labs’ image factory. Pass the machine config through the platform’s metadata service.

Step 4: Apply Configuration and Bootstrap

Once nodes are booted and in maintenance mode, apply the machine configs:

# Configure talosctl to use the generated credentials
export TALOSCONFIG="_out/talosconfig"

# Apply config to the first control plane node
talosctl apply-config --insecure \
  --nodes 10.0.0.10 \
  --file _out/controlplane.yaml

# Apply config to worker nodes
talosctl apply-config --insecure \
  --nodes 10.0.0.20 \
  --file _out/worker.yaml

The --insecure flag is required for the initial config application because the node does not yet have TLS certificates. After the config is applied, all subsequent communication uses mTLS.

Now bootstrap the Kubernetes cluster from the first control plane node:

# Set the endpoint and node
talosctl config endpoint 10.0.0.10
talosctl config node 10.0.0.10

# Bootstrap etcd and the control plane
talosctl bootstrap

This command initializes etcd, generates the Kubernetes PKI, and starts the control plane static pods. Within a minute or two, the Kubernetes API server is available.

Step 5: Retrieve kubeconfig and Verify

# Get the kubeconfig
talosctl kubeconfig -n 10.0.0.10

# Verify the cluster
kubectl get nodes
kubectl get pods -A

Essential talosctl Commands

Once the cluster is running, these are the commands you will use daily:

# Check node health
talosctl health --nodes 10.0.0.10

# Stream kernel messages (equivalent to dmesg -w)
talosctl dmesg --nodes 10.0.0.10 --follow

# View service logs
talosctl logs kubelet --nodes 10.0.0.10
talosctl logs etcd --nodes 10.0.0.10

# List running services
talosctl services --nodes 10.0.0.10

# Get machine config (current running config)
talosctl get machineconfig --nodes 10.0.0.10

# Inspect resource state
talosctl get members --nodes 10.0.0.10
talosctl get addresses --nodes 10.0.0.10

Day-2 Operations

Installation is only the beginning. The real value of Talos emerges in day-2 operations: upgrades, config changes, and cluster maintenance. This is where the declarative, API-driven model pays dividends.

Upgrading Talos Linux

Talos upgrades are performed node by node through the API. The process downloads the new OS image, writes it to the inactive boot partition, and reboots the node into the new version. If the upgrade fails, the node automatically rolls back to the previous image.

# Upgrade a single node
talosctl upgrade --nodes 10.0.0.10 \
  --image ghcr.io/siderolabs/installer:v1.9.0

# Upgrade with --preserve to keep the EPHEMERAL partition
talosctl upgrade --nodes 10.0.0.10 \
  --image ghcr.io/siderolabs/installer:v1.9.0 \
  --preserve

For production clusters, follow this sequence: upgrade control plane nodes one at a time, verify etcd health after each, then upgrade workers in a rolling fashion. The --preserve flag is important if you want to keep downloaded container images and avoid re-pulling everything after the reboot.

Upgrading Kubernetes Version

Kubernetes version upgrades are separate from Talos OS upgrades. You can run a newer version of Kubernetes on an older Talos release (within compatibility bounds). The upgrade is triggered through talosctl:

talosctl upgrade-k8s --nodes 10.0.0.10 \
  --to 1.31.0

This command orchestrates the upgrade of all control plane components (kube-apiserver, kube-controller-manager, kube-scheduler, kube-proxy) and then rolls the kubelet version across all nodes. It respects PodDisruptionBudgets and cordons/drains nodes before upgrading.

Customizing Machine Config with Patches

As your cluster evolves, you will need to modify machine configurations — adding a registry mirror, changing kubelet flags, or configuring network bonding. Talos supports config patches that overlay changes onto the base config without replacing the entire file.

# Create a patch file
cat > kubelet-patch.yaml << 'EOF'
machine:
  kubelet:
    extraArgs:
      max-pods: "150"
    extraMounts:
      - destination: /var/local-storage
        type: bind
        source: /var/local-storage
        options:
          - bind
          - rw
EOF

# Apply the patch
talosctl apply-config --nodes 10.0.0.20 \
  --config-patch @kubelet-patch.yaml

Patches can also be applied at generation time with talosctl gen config --config-patch, which is ideal for encoding environment-specific overrides into your GitOps pipeline.

etcd Management

Talos manages etcd as a first-class service, not as a manually deployed component. Common etcd operations are available through talosctl:

# Check etcd member list
talosctl etcd members --nodes 10.0.0.10

# Take an etcd snapshot (backup)
talosctl etcd snapshot db.snapshot --nodes 10.0.0.10

# Remove a failed etcd member
talosctl etcd remove-member --nodes 10.0.0.10 

# Force a new etcd cluster from a single node (disaster recovery)
talosctl etcd forfeit-leadership --nodes 10.0.0.10

Regular etcd snapshots are non-negotiable for any production cluster. Automate this with a CronJob that calls the Talos API or runs talosctl etcd snapshot from an external host.

Limitations and When NOT to Use Talos Linux

Talos is not the right choice for every environment. Understanding its limitations is just as important as understanding its strengths.

No SSH Debugging

The most immediate pain point: when something goes wrong, you cannot SSH into the node and poke around. You are limited to what the Talos API exposes — logs, dmesg, service status, and resource state. For most Kubernetes issues, this is sufficient. But for low-level kernel or hardware debugging, you may need to boot the node from a different OS temporarily.

Talos does offer a talosctl dashboard command that provides a real-time TUI (text UI) showing CPU, memory, network, and service status. Combined with talosctl logs and talosctl dmesg, you can troubleshoot most problems. But the learning curve is real, especially for teams accustomed to reaching for htop and journalctl.

Learning Curve for Traditional Sysadmins

If your team manages infrastructure through SSH, Ansible playbooks, and shell scripts, Talos requires a fundamental shift in operational practices. There is no way to "just install" a debugging tool on a node. Everything must be done through the API or through Kubernetes workloads (DaemonSets with host-level access). This shift is valuable in the long run, but it requires investment in training and new workflows.

Custom Kernel Modules

Talos ships a specific kernel build with a curated set of modules. If your workload requires a custom kernel module (GPU drivers, specific storage drivers, or out-of-tree network drivers), you need to build a custom Talos image using the Talos image factory or the imager tool. This is supported but adds operational complexity compared to distributions where you can simply apt install a kernel module package.

Sidero Labs provides an Image Factory service that lets you build custom Talos images with additional system extensions (like NVIDIA drivers, iSCSI tools, or ZFS support) through a web interface or API.

Workloads Requiring Host-Level Access

Some workloads expect to interact with the host OS directly: log collectors that read /var/log, monitoring agents that read /proc, or security tools that install kernel modules. Most of these work in Talos (containerd's runtime allows host path mounts), but some assume a traditional Linux userland that simply does not exist. Evaluate your specific stack before committing.

Real-World Use Cases

Homelab and Learning

Talos is an excellent choice for homelab Kubernetes clusters. It runs on Raspberry Pi 4/5, Intel NUCs, and old laptops. The entire OS fits in minimal storage, and the declarative config model means you can rebuild your cluster from scratch in minutes by reapplying your machine configs. Many homelab operators use Talos with ArgoCD or Flux for a fully GitOps-managed stack.

Edge and Retail

Edge deployments benefit from Talos's small footprint, immutable design, and remote management. A retail chain with 500 store locations running local Kubernetes clusters can manage every node through the Talos API without ever needing physical or SSH access. The A/B upgrade mechanism ensures that a bad update does not brick a remote device.

Production Multi-Cloud Clusters

Talos provides a consistent node OS across AWS, Azure, GCP, and bare metal. This is valuable for organizations that run Kubernetes on multiple providers and want a single operational model for node management. Instead of maintaining separate AMIs, Azure images, and GCP images with different toolchains, you maintain one set of Talos machine configs with platform-specific patches.

Security-Sensitive Environments

For regulated industries (finance, healthcare, government), Talos's security posture simplifies compliance. The absence of SSH, shell, and package management eliminates entire categories of CIS benchmark requirements. Audit teams appreciate that there is no way for a rogue operator to install unauthorized software on the node OS. The immutable image model also simplifies forensics: if the OS hash does not match the known-good image, the node has been tampered with.

Frequently Asked Questions

Can you SSH into Talos Linux?

No. Talos Linux does not include an SSH daemon, a shell, or any interactive login mechanism. All node management is performed through the Talos API using talosctl. This is a deliberate design decision to eliminate the attack surface associated with shell access and prevent configuration drift from ad-hoc changes.

Is Talos Linux free and open source?

Yes. Talos Linux is open source under the Mozilla Public License 2.0. It is developed by Sidero Labs, which also offers Omni — a commercial SaaS platform for managing Talos clusters at scale. The OS itself is fully free to use in production without restrictions.

How do you debug a Talos Linux node without shell access?

Talos provides several debugging tools through its API: talosctl dmesg for kernel messages, talosctl logs <service> for service logs, talosctl dashboard for a real-time system overview, and talosctl get for inspecting resource state (network, disks, services). For deeper debugging, you can run a privileged DaemonSet pod with nsenter to access the host namespace from within Kubernetes.

Can Talos Linux run workloads other than Kubernetes?

No. Talos Linux is purpose-built exclusively for Kubernetes. It does not support running arbitrary containers, system services, or applications outside of the Kubernetes workload model. If you need to run non-Kubernetes workloads on the same host, consider Flatcar Container Linux or a traditional distribution.

What happens if a Talos upgrade fails?

Talos uses an A/B partition scheme for upgrades. The new image is written to the inactive boot partition, and the node reboots into it. If the new image fails to boot successfully (the health check does not pass within the configured timeout), the bootloader automatically reverts to the previous working image on the next reboot. This makes upgrades inherently safe and reversible without manual intervention.

XSLTPlayground.com: Test, Optimize, and Debug XSLT Online in Real Time

XSLTPlayground.com: Test, Optimize, and Debug XSLT Online in Real Time

Working with XSLT in modern data pipelines and XML-driven systems has always been powerful… but not always easy. Tools are often heavyweight, outdated, or require local setup and complex environments. That’s why I’m thrilled to announce the launch of XSLTPlayground.com — a free, open-source, browser-based XSLT editor designed specifically for real-world use cases.

This article is part of my comprehensive TIBCO Integration Platform Guide where you can find more patterns and best practices for TIBCO integration platforms.

p:paragraph –>

No installations. No complexity. Just open your browser and transform.

🚀 Why XSLT Playground?

🔁 Real-time XSLT Transformations for Real-World Scenarios

Unlike legacy tools or limited web demos, XSLT Playground supports complex transformations involving multiple XML sources, parameterized templates, and real feedback. Whether you work on data integration, API gateways, XML-based reporting, or legacy system upgrades, this tool helps you test and iterate quickly.

🧩 Multi-Input Parameter Support

One of the biggest pain points in XSLT testing is simulating real environments. With XSLTPlayground.com, you can define multiple input sources (e.g., data feeds, configuration, or metadata), and pass them into your XSLT in a synchronized way — just like a production data pipeline.

⚙️ Automatic Parameter Synchronization

When you load a stylesheet with required parameters, the Playground automatically detects them and creates input fields for you on the side. All you need to do is fill in the values. This smart feature removes the guesswork and helps avoid runtime errors.

⚡ Performance & Optimization Insights

Need to know if your optimization is working? We display execution time for each transformation, helping you compare versions and choose the faster approach — all without deploying full systems or instrumenting code. While it’s not a benchmarking tool, the feedback is invaluable for real-time tuning.

🌐 100% Free, Web-based, and Open Source

No need to install bulky tools like Oxygen XML or run Eclipse plugins just to test a stylesheet. XSLTPlayground.com is entirely web-based, free, and built to be open and extensible. Want to contribute or host your own version? The source is on GitHub.

🖱️ Drag & Drop Support

Upload your XML or XSLT files by simply dragging them into the browser. All components — inputs, stylesheets, outputs — support drag and drop for faster iteration.

🎨 Pretty Print and Export Options

Your output is automatically pretty-printed for readability, and with just one click you can download your XSLT and transformation result, making it easy to share, archive, or import into larger projects.

🔗 Try it now: https://xsltplayground.com

Whether you’re a developer, data engineer, or working with legacy systems, this is the tool you’ve been waiting for. Say goodbye to the complexity of setting up XSLT tests and say hello to instant transformations — anywhere, anytime.

Want to contribute or follow development? Star the project on GitHub or send feedback directly from the site.

Helm v3.17 Take Ownership Flag: Fix Release Conflicts

Helm v3.17 Take Ownership Flag: Fix Release Conflicts

Helm has long been the standard for managing Kubernetes applications using packaged charts, bringing a level of reproducibility and automation to the deployment process. However, some operational tasks, such as renaming a release or migrating objects between charts, have traditionally required cumbersome workarounds. With the introduction of the --take-ownership flag in Helm v3.17 (released in January 2025), a long-standing pain point is finally addressed—at least partially.

The take-ownership feature represents the continuing evolution of Helm. Learn about this and other cutting-edge capabilities in our Helm Charts Package Management Guide

In this post, we will explore:

  • What the --take-ownership flag does
  • Why it was needed
  • The caveats and limitations
  • Real-world use cases where it helps
  • When not to use it

Understanding Helm Release Ownership and Object Management

When Helm installs or upgrades a chart, it injects metadata—labels and annotations—into every managed Kubernetes object. These include:

app.kubernetes.io/managed-by: Helm
meta.helm.sh/release-name: my-release
meta.helm.sh/release-namespace: default

This metadata serves an important role: Helm uses it to track and manage resources associated with each release. As a safeguard, Helm does not allow another release to modify objects it does not own and when you trying that you will see messages like the one below:

Error: Unable to continue with install: Service "provisioner-agent" in namespace "test-my-ns" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: key "meta.helm.sh/release-name" must equal "dp-core-infrastructure11": current value is "dp-core-infrastructure"

While this protects users from accidental overwrites, it creates limitations for advanced use cases.

Why --take-ownership Was Needed

Let’s say you want to:

  • Rename an existing Helm release from api-v1 to api.
  • Move a ConfigMap or Service from one chart to another.
  • Rebuild state during GitOps reconciliation when previous Helm metadata has drifted.

Previously, your only option was to:

  1. Uninstall the existing release.
  2. Reinstall under the new name.

This approach introduces downtime, and in production systems, that’s often not acceptable.

What the Flag Does

helm upgrade my-release ./my-chart --take-ownership

When this flag is passed, Helm will:

  • Skip the ownership validation for existing objects.
  • Override the labels and annotations to associate the object with the current release.

In practice, this allows you to claim ownership of resources that previously belonged to another release, enabling seamless handovers.

⚠️ What It Doesn’t Do

This flag does not:

  • Clean up references from the previous release.
  • Protect you from future uninstalls of the original release (which might still remove shared resources).
  • Allow you to adopt completely unmanaged Kubernetes resources (those not initially created by Helm).

In short, it’s a mechanism for bypassing Helm’s ownership checks, not a full lifecycle manager.

Real-World Helm Take Ownership Use Cases

Let’s go through common scenarios where this feature is useful.

✅ 1. Renaming a Release Without Downtime

Before:

helm uninstall old-name
helm install new-name ./chart

Now:

helm upgrade new-name ./chart --take-ownership

✅ 2. Migrating Objects Between Charts

You’re refactoring a large chart into smaller, modular ones and need to reassign certain Service or Secret objects.

This flag allows the new release to take control of the object without deleting or recreating it.

✅ 3. GitOps Drift Reconciliation

If objects were deployed out-of-band or their metadata changed unintentionally, GitOps tooling using Helm can recover without manual intervention using --take-ownership.

Best Practices and Recommendations

  • Use this flag intentionally, and document where it’s applied.
  • If possible, remove the previous release after migration to avoid confusion.
  • Monitor Helm’s behavior closely when managing shared objects.
  • For non-Helm-managed resources, continue to use kubectl annotate or kubectl label to manually align metadata.

Conclusion

The --take-ownership flag is a welcomed addition to Helm’s CLI arsenal. While not a universal solution, it smooths over many of the rough edges developers and SREs face during release evolution and GitOps adoption.

It brings a subtle but powerful improvement—especially in complex environments where resource ownership isn’t static.

Stay updated with Helm releases, and consider this flag your new ally in advanced release engineering.

Frequently Asked Questions

What does the Helm –take-ownership flag do?

The --take-ownership flag allows Helm to bypass ownership validation and claim control of Kubernetes resources that belong to another release. It updates the meta.helm.sh/release-name annotation to associate objects with the current release, enabling zero-downtime release renames and chart migrations.

When should I use Helm take ownership?

Use --take-ownership when renaming releases without downtime, migrating objects between charts, or fixing GitOps drift. It’s ideal for production environments where uninstall/reinstall cycles aren’t acceptable. Always document usage and clean up previous releases afterward.

What are the limitations of Helm take ownership?

The flag doesn’t clean up references from previous releases or protect against future uninstalls of the original release. It only works with Helm-managed resources, not completely unmanaged Kubernetes objects. Manual cleanup of old releases is still required.

Is Helm take ownership safe for production use?

Yes, but use it intentionally and carefully. The flag bypasses Helm’s safety checks, so ensure you understand the ownership implications. Test in staging first, document all usage, and monitor for conflicts. Remove old releases after successful migration to avoid confusion.

Which Helm version introduced the take ownership flag?

The --take-ownership flag was introduced in Helm v3.17, released in January 2025. This feature addresses long-standing pain points with release renaming and chart migrations that previously required downtime-inducing uninstall/reinstall cycles.

Extending Kyverno Policies: Creating Custom Rules for Kubernetes Security

Extending Kyverno Policies: Creating Custom Rules for Kubernetes Security

Kyverno offers a robust, declarative approach to enforcing security and compliance standards within Kubernetes clusters by allowing users to define and enforce custom policies. For an in-depth look at Kyverno’s functionality, including core concepts and benefits, see my detailed article here. In this guide, we’ll focus on extending Kyverno policies, providing a structured walkthrough of its data model, and illustrating use cases to make the most of Kyverno in a Kubernetes environment.

Understanding the Kyverno Policy Data Model

Kyverno policies consist of several components that define how the policy should behave, which resources it should affect, and the specific rules that apply. Let’s dive into the main parts of the Kyverno policy model:

  1. Policy Definition: This is the root configuration where you define the policy’s metadata, including name, type, and scope. Policies can be created at the namespace level for specific areas or as cluster-wide rules to enforce uniform standards across the entire Kubernetes cluster.
  2. Rules: Policies are made up of rules that dictate what conditions Kyverno should enforce. Each rule can include logic for validation, mutation, or generation based on your needs.
  3. Match and Exclude Blocks: These sections allow fine-grained control over which resources the policy applies to. You can specify resources by their kinds (e.g., Pods, Deployments), namespaces, labels, and even specific names. This flexibility is crucial for creating targeted policies that impact only the resources you want to manage.
    1. Match block: Defines the conditions under which the rule applies to specific resources.
    2. Exclude block: Used to explicitly omit resources that match certain conditions, ensuring that unaffected resources are not inadvertently included.
  4. Validation, Mutation, and Generation Actions: Each rule can take different types of actions:
    1. Validation: Ensures resources meet specific criteria and blocks deployment if they don’t.
    2. Mutation: Adjusts resource configurations to align with predefined standards, which is useful for auto-remediation.
    3. Generation: Creates or manages additional resources based on existing resource configurations.

Example: Restricting Container Image Sources to Docker Hub

A common security requirement is to limit container images to trusted registries. The example below demonstrates a policy that only permits images from Docker Hub.

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: restrict-dockerhub-images
spec:
  rules:
    - name: only-dockerhub-images
      match:
        resources:
          kinds:
            - Pod
      validate:
        message: "Only Docker Hub images are allowed."
        pattern:
          spec:
            containers:
              - image: "docker.io/*"

This policy targets all Pod resources in the cluster and enforces a validation rule that restricts the image source to docker.io. If a Pod uses an image outside Docker Hub, Kyverno denies its deployment, reinforcing secure sourcing practices.

Practical Use-Cases for Kyverno Policies

Kyverno policies can handle a variety of Kubernetes management tasks through validation, mutation, and generation. Let’s explore examples for each type to illustrate Kyverno’s versatility:

1. Validation Policies

Validation policies in Kyverno ensure that resources comply with specific configurations or security standards, stopping any non-compliant resources from deploying.

Use-Case: Enforcing Resource Limits for Containers

This example prevents deployments that lack resource limits, ensuring all Pods specify CPU and memory constraints.

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: enforce-resource-limits
spec:
  rules:
    - name: require-resource-limits
      match:
        resources:
          kinds:
            - Pod
      validate:
        message: "Resource limits (CPU and memory) are required for all containers."
        pattern:
          spec:
            containers:
              - resources:
                  limits:
                    cpu: "?*"
                    memory: "?*"

By enforcing resource limits, this policy helps prevent resource contention in the cluster, fostering stable and predictable performance.

2. Mutation Policies

Mutation policies allow Kyverno to automatically adjust configurations in resources to meet compliance requirements. This approach is beneficial for consistent configurations without manual intervention.

Use-Case: Adding Default Labels to Pods

This policy adds a default label, environment: production, to all new Pods that lack this label, ensuring that resources align with organization-wide labeling standards.

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: add-default-label
spec:
  rules:
    - name: add-environment-label
      match:
        resources:
          kinds:
            - Pod
      mutate:
        patchStrategicMerge:
          metadata:
            labels:
              environment: "production"

This mutation policy is an example of how Kyverno can standardize resource configurations at scale by dynamically adding missing information, reducing human error and ensuring labeling consistency.

3. Generation Policies

Generation policies in Kyverno are used to create or update related resources, enhancing Kubernetes automation by responding to specific configurations or needs in real-time.

Use-Case: Automatically Creating a ConfigMap for Each New Namespace

This example policy generates a ConfigMap in every new namespace, setting default configuration values for all resources in that namespace.

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: generate-configmap
spec:
  rules:
    - name: add-default-configmap
      match:
        resources:
          kinds:
            - Namespace
      generate:
        kind: ConfigMap
        name: default-config
        namespace: "{{request.object.metadata.name}}"
        data:
          default-key: "default-value"

This generation policy is triggered whenever a new namespace is created, automatically provisioning a ConfigMap with default settings. This approach is especially useful in multi-tenant environments, ensuring new namespaces have essential configurations in place.

Conclusion

Extending Kyverno policies enables Kubernetes administrators to establish and enforce tailored security and operational practices within their clusters. By leveraging Kyverno’s capabilities in validation, mutation, and generation, you can automate compliance, streamline operations, and reinforce security standards seamlessly.

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.