Kubernetes Security Best Practices: A Production Hardening Guide

Kubernetes Security Best Practices: A Production Hardening Guide

Kubernetes security is not a single feature you enable — it is a layered discipline that spans the control plane, workloads, networking, supply chain, and runtime. This guide covers the security controls that matter most in production, why each one exists, and how to implement them without breaking your cluster.

The Kubernetes Attack Surface

Before hardening anything, understand what you are protecting. A Kubernetes cluster has several distinct attack surfaces:

  • API server — The central control plane. Any entity that can reach it with valid credentials can read cluster state, modify workloads, or escalate privileges.
  • etcd — Stores all cluster state in plain text, including Secrets. Direct etcd access is equivalent to root on every node.
  • Nodes — A compromised node can access all Secrets mounted on pods running on it, access the kubelet API, and potentially escape to the hypervisor.
  • Pods — Privileged pods, host-network pods, and pods with excessive capabilities can break container isolation.
  • Supply chain — Malicious images, compromised registries, and unsigned artifacts can introduce attacker-controlled code into your cluster.
  • RBAC — Overly permissive roles allow lateral movement and privilege escalation once an attacker gains any foothold.

The controls below address each of these surfaces. Prioritize based on your threat model — a public-facing multi-tenant cluster needs all of them; an internal development cluster can relax some.

1. RBAC: Least Privilege from Day One

Role-Based Access Control is Kubernetes’ primary authorization mechanism. Most clusters fail at RBAC not because it is misconfigured, but because it is over-permissive by default and nobody reviews it systematically.

Common RBAC Mistakes

  • Binding to cluster-admin for convenience. Almost no workload needs cluster-admin. Use namespaced roles wherever possible.
  • Using * verbs or resources in roles. Wildcard permissions are almost always broader than intended.
  • Not auditing ServiceAccount token usage. Every pod gets a ServiceAccount. The default ServiceAccount in most namespaces has no permissions, but custom workloads often get over-permissive SAs.
  • Forgetting automountServiceAccountToken: false. If a workload does not need to talk to the Kubernetes API, disable token mounting entirely.

Practical RBAC Patterns

For a workload that only needs to read ConfigMaps in its own namespace:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: configmap-reader
  namespace: my-app
rules:
- apiGroups: [""]
  resources: ["configmaps"]
  verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: my-app-configmap-reader
  namespace: my-app
subjects:
- kind: ServiceAccount
  name: my-app
  namespace: my-app
roleRef:
  kind: Role
  name: configmap-reader
  apiGroup: rbac.authorization.k8s.io

Audit existing RBAC with kubectl-who-can or rbac-tool to find overly permissive bindings before attackers do.

2. Pod Security Standards

PodSecurityPolicy was deprecated in Kubernetes 1.21 and removed in 1.25. Its replacement is Pod Security Admission (PSA), a built-in admission controller that enforces one of three security profiles at the namespace level:

  • Privileged — No restrictions. For system components only.
  • Baseline — Prevents the most critical privilege escalations: privileged containers, hostPID, hostIPC, hostNetwork, dangerous capabilities.
  • Restricted — Enforces current hardening best practices. Requires running as non-root, dropping all capabilities, and using a restricted seccomp profile.

Enable enforcement at the namespace level with labels:

apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/enforce-version: v1.30
    pod-security.kubernetes.io/warn: restricted
    pod-security.kubernetes.io/warn-version: v1.30
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/audit-version: v1.30

A pod that runs as root or requests host-network in a namespace enforcing restricted will be rejected at admission. The warn and audit modes let you test before enforcing.

PSA covers the most critical pod-level escalations, but it is coarse-grained. For fine-grained policy control, use Kyverno alongside PSA.

3. Network Policies: Micro-Segmentation

By default, every pod in a Kubernetes cluster can communicate with every other pod across all namespaces. This is a flat network model that gives attackers unrestricted lateral movement once they compromise any workload.

Network Policies define L3/L4 allow-rules for pod-to-pod communication. They are enforced by your CNI plugin (Calico, Cilium, Weave — not Flannel, which does not support NetworkPolicy).

Default Deny Pattern

Start by denying all ingress and egress in every namespace, then open only what is explicitly needed:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

Then allow specific traffic:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-api-to-db
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: postgres
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: api
    ports:
    - protocol: TCP
      port: 5432

Do not forget DNS egress — most workloads need to resolve names via kube-dns, which requires UDP port 53 egress to the kube-system namespace.

4. Secrets Management

Kubernetes Secrets are base64-encoded, not encrypted. Stored in etcd in plain text by default. Anyone with get permission on Secrets can read them. This is not a vulnerability — it is a design decision that puts the responsibility on you to:

  • Enable encryption at rest for etcd. Configure EncryptionConfiguration with an AES-CBC or AES-GCM provider. This encrypts Secrets before they are written to etcd.
  • Use external secret stores. HashiCorp Vault, AWS Secrets Manager, or Azure Key Vault with the External Secrets Operator means actual secret values never live in Kubernetes at all.
  • Restrict Secret RBAC aggressively. Never give list on Secrets cluster-wide — it returns all values. Use get on named resources where possible.
  • Avoid environment variables for secrets. Prefer volume mounts. Env vars are visible in pod inspect output and can leak through application logging.
# etcd encryption at rest - in kube-apiserver config
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
- resources:
  - secrets
  providers:
  - aescbc:
      keys:
      - name: key1
        secret: <base64-encoded-32-byte-key>
  - identity: {}

5. Image Security and Supply Chain

Your runtime security posture is only as good as the images you run. A compromised image from a public registry bypasses every runtime control you have.

Scan images in CI

Use Trivy, Grype, or Snyk to scan images as part of your CI pipeline. Block deployments of images with critical CVEs:

# In your CI pipeline
trivy image --exit-code 1 --severity CRITICAL your-image:tag

Use a private registry with admission control

Only allow images from your private registry using an admission webhook (Kyverno, OPA Gatekeeper). This prevents developers from running arbitrary public images in production:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: restrict-image-registries
spec:
  validationFailureAction: Enforce
  rules:
  - name: validate-registries
    match:
      any:
      - resources:
          kinds: ["Pod"]
    validate:
      message: "Images must come from registry.company.com"
      pattern:
        spec:
          containers:
          - image: "registry.company.com/*"

Use distroless or minimal base images

Distroless images contain only the application and its runtime dependencies — no shell, no package manager, no debugging tools. This drastically reduces the attack surface and the number of CVEs. Google’s distroless images are available for Java, Node.js, Python, and Go.

Sign and verify images

Cosign (from the Sigstore project) lets you sign container images and verify signatures at admission time using Kyverno or Connaisseur. This prevents image substitution attacks where an attacker replaces a legitimate image in your registry.

6. Runtime Security

Runtime security detects and responds to malicious activity after a container is running. The primary tool in this space is Falco — a CNCF project that uses eBPF to monitor system calls and raise alerts when containers behave unexpectedly.

Default Falco rules catch common attack patterns:

  • Shell spawned in a container
  • Network connection to an unexpected IP
  • Write to a sensitive file path (/etc/passwd, /etc/shadow)
  • Privilege escalation via setuid binaries
  • Container drift (new executable files written at runtime)

Combine Falco with seccomp profiles to restrict the system calls a container can make at the kernel level. The RuntimeDefault seccomp profile (available since Kubernetes 1.27 as a default) blocks 300+ system calls that containers virtually never need.

spec:
  securityContext:
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: app
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      runAsNonRoot: true
      runAsUser: 65534
      capabilities:
        drop: ["ALL"]

These four securityContext settings together (allowPrivilegeEscalation: false, readOnlyRootFilesystem: true, runAsNonRoot: true, capabilities.drop: ALL) make container escape significantly harder and satisfy the Kubernetes Restricted pod security standard.

7. API Server Hardening

The API server is the most critical component to harden. Key settings:

  • Disable anonymous authentication. --anonymous-auth=false ensures every request is authenticated.
  • Enable audit logging. Log all API server requests to a file or webhook. Without audit logs, you cannot investigate incidents or detect RBAC abuse.
  • Restrict admission plugins. Ensure NodeRestriction is enabled — it prevents node kubelets from modifying objects outside their own node.
  • Do not expose the API server to the internet. Use a VPN, bastion host, or private endpoint. If you must expose it, restrict access by IP.
# Minimal audit policy - log all requests at metadata level,
# and full request body for sensitive resources
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: RequestResponse
  resources:
  - group: ""
    resources: ["secrets", "configmaps"]
- level: Metadata
  omitStages: ["RequestReceived"]

8. etcd Security

etcd stores all cluster state. Treat it as sensitive as your production database:

  • Enable TLS for all etcd communication. Both peer communication (etcd-to-etcd) and client communication (apiserver-to-etcd) must use mutual TLS.
  • Restrict network access to etcd. etcd should only be reachable by the API server. Use firewall rules or security groups to enforce this.
  • Enable encryption at rest. As described in the Secrets section above.
  • Backup etcd regularly. An etcd snapshot is a complete copy of all cluster state, including all Secrets. Encrypt backups and store them separately from the cluster.

9. CIS Kubernetes Benchmark

The CIS Kubernetes Benchmark is a comprehensive checklist of security controls covering the control plane, nodes, and workloads. Running kube-bench against your cluster gives you a scored assessment against the CIS controls:

kubectl apply -f https://raw.githubusercontent.com/aquasecurity/kube-bench/main/job.yaml
kubectl logs $(kubectl get pods -l app=kube-bench -o name)

kube-bench outputs PASS/FAIL/WARN for each control with remediation guidance. Run it after initial cluster setup and after major configuration changes.

10. Continuous Security Posture with Kubescape

Kubescape and similar tools (Starboard/Trivy Operator, KubeScore) provide continuous security scanning of live cluster state — not just a one-time audit. They check workloads against NSA/CISA hardening guidelines, MITRE ATT&CK framework, and the CIS benchmark in real time.

Deploy Trivy Operator for continuous in-cluster scanning:

helm repo add aquasecurity https://aquasecurity.github.io/helm-charts/
helm install trivy-operator aquasecurity/trivy-operator 
  --namespace trivy-system 
  --create-namespace 
  --set="trivy.ignoreUnfixed=true"

Trivy Operator creates VulnerabilityReport, ConfigAuditReport, and RbacAssessmentReport custom resources in the same namespace as each workload. These can be scraped by Prometheus and displayed in Grafana for a security dashboard.

Security Hardening Checklist

  • ✅ RBAC reviewed — no wildcard roles, no unnecessary cluster-admin bindings
  • ✅ ServiceAccount token automount disabled for workloads that do not need API access
  • ✅ Pod Security Standards enforced at namespace level (at least Baseline, Restricted where possible)
  • ✅ Network policies deployed — default deny with explicit allows
  • ✅ Secrets encrypted at rest in etcd
  • ✅ Images scanned in CI — no critical CVEs in production
  • ✅ Private registry enforced via admission control
  • ✅ Container securityContext hardened (non-root, read-only filesystem, no capabilities)
  • ✅ seccomp RuntimeDefault profile enabled
  • ✅ API server audit logging enabled
  • ✅ etcd TLS and network access restricted
  • ✅ kube-bench run and critical/high findings remediated
  • ✅ Runtime security (Falco) deployed and alerts routed to on-call
  • ✅ Continuous scanning (Trivy Operator or Kubescape) deployed

FAQ

Where do I start if my cluster has no security controls today?

Start with the highest-impact, lowest-effort controls first: audit your RBAC (revoke cluster-admin where not needed), enable Pod Security Admission in warn mode on all namespaces, and deploy Trivy Operator. These three steps give you immediate visibility and prevent the most common privilege escalations without breaking anything.

Does enabling Network Policies break DNS resolution?

Yes, if you deploy a default-deny egress policy without explicitly allowing DNS. Add an egress rule allowing UDP port 53 to the kube-dns service in kube-system when applying default-deny network policies.

Is Kubernetes certified for PCI-DSS or SOC 2?

Kubernetes itself is not certified — your configuration and the controls you implement determine compliance. The CIS Kubernetes Benchmark maps to many PCI-DSS and SOC 2 requirements. Managed Kubernetes offerings (EKS, GKE, AKS) have their own compliance certifications for the underlying infrastructure.

Should I use OPA Gatekeeper or Kyverno?

Both enforce admission policies, but Kyverno is Kubernetes-native (policies are written as YAML) while Gatekeeper uses Rego (a purpose-built policy language). For teams without Rego expertise, Kyverno is significantly faster to adopt and maintain. For teams already using OPA elsewhere in their stack, Gatekeeper offers consistency. Both integrate well with GitOps workflows.

How often should I update Kubernetes for security patches?

Follow a patch release within 30 days of release for CVEs rated High or Critical. Minor version upgrades (e.g., 1.29 → 1.30) should happen within the support window — Kubernetes maintains the last three minor versions. Falling more than one minor version behind means running without security patches for a growing subset of the codebase.

For a deeper look at how security fits into the broader Kubernetes platform architecture, see the Kubernetes architecture patterns guide and the guide on building a security-first Kubernetes culture.

Integrate Kyverno CLI into CI/CD Pipelines with GitHub Actions for Kubernetes Policy Checks

Integrate Kyverno CLI into CI/CD Pipelines with GitHub Actions for Kubernetes Policy Checks

Introduction

As Kubernetes clusters become an integral part of infrastructure, maintaining compliance with security and configuration policies is crucial. Kyverno, a policy engine designed for Kubernetes, can be integrated into your CI/CD pipelines to enforce configuration standards and automate policy checks. In this article, we’ll walk through integrating Kyverno CLI with GitHub Actions, providing a seamless workflow for validating Kubernetes manifests before they reach your cluster.

What is Kyverno CLI?

Kyverno is a Kubernetes-native policy management tool, enabling users to enforce best practices, security protocols, and compliance across clusters. Kyverno CLI is a command-line interface that lets you apply, test, and validate policies against YAML manifests locally or in CI/CD pipelines. By integrating Kyverno CLI with GitHub Actions, you can automate these policy checks, ensuring code quality and compliance before deploying resources to Kubernetes.

Benefits of Using Kyverno CLI in CI/CD Pipelines

Integrating Kyverno into your CI/CD workflow provides several advantages:

  1. Automated Policy Validation: Detect policy violations early in the CI/CD pipeline, preventing misconfigured resources from deployment.
  2. Enhanced Security Compliance: Kyverno enables checks for security best practices and compliance frameworks.
  3. Faster Development: Early feedback on policy violations streamlines the process, allowing developers to fix issues promptly.

Setting Up Kyverno CLI in GitHub Actions

Step 1: Install Kyverno CLI

To use Kyverno in your pipeline, you need to install the Kyverno CLI in your GitHub Actions workflow. You can specify the Kyverno version required for your project or use the latest version.

Here’s a sample GitHub Actions YAML configuration to install Kyverno CLI:

name: CI Pipeline with Kyverno Policy Checks

on:
  push:
    branches:
      - main
  pull_request:
    branches:
      - main

jobs:
  kyverno-policy-check:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout Code
        uses: actions/checkout@v2

      - name: Install Kyverno CLI
        run: |
          curl -LO https://github.com/kyverno/kyverno/releases/download/v<version>/kyverno-cli-linux.tar.gz
          tar -xzf kyverno-cli-linux.tar.gz
          sudo mv kyverno /usr/local/bin/

Replace <version> with the version of Kyverno CLI you wish to use. Alternatively, you can replace it with latest to always fetch the latest release.

Step 2: Define Policies for Validation

Create a directory in your repository to store Kyverno policies. These policies define the standards that your Kubernetes resources should comply with. For example, create a directory structure as follows:

.
└── .github
    └── policies
        ├── disallow-latest-tag.yaml
        └── require-requests-limits.yaml

Each policy is defined in YAML format and can be customized to meet specific requirements. Below are examples of policies that might be used:

  • Disallow latest Tag in Images: Prevents the use of the latest tag to ensure version consistency.
  • Enforce CPU/Memory Limits: Ensures resource limits are set for containers, which can prevent resource abuse.

Step 3: Add a GitHub Actions Step to Validate Manifests

In this step, you’ll use Kyverno CLI to validate Kubernetes manifests against the policies defined in the .github/policies directory. If a manifest fails validation, the pipeline will halt, preventing non-compliant resources from being deployed.

Here’s the YAML configuration to validate manifests:

- name: Validate Kubernetes Manifests
  run: |
    kyverno apply .github/policies -r manifests/

Replace manifests/ with the path to your Kubernetes manifests in the repository. This command applies all policies in .github/policies against each YAML file in the manifests directory, stopping the pipeline if any non-compliant configurations are detected.

Step 4: Handle Validation Results

To make the output of Kyverno CLI more readable, you can use additional GitHub Actions steps to format and handle the results. For instance, you might set up a conditional step to notify the team if any manifest is non-compliant:

- name: Check for Policy Violations
  if: failure()
  run: echo "Policy violation detected. Please review the failed validation."

Alternatively, you could configure notifications to alert your team through Slack, email, or other integrations whenever a policy violation is identified.

Example: Validating a Kubernetes Manifest

Suppose you have a manifest defining a Kubernetes deployment as follows:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
        - name: nginx
          image: nginx:latest  # Should trigger a violation

The policy disallow-latest-tag.yaml checks if any container image uses the latest tag and rejects it. When this manifest is processed, Kyverno CLI flags the image and halts the CI/CD pipeline with an error, preventing the deployment of this manifest until corrected.

Conclusion

Integrating Kyverno CLI into a GitHub Actions CI/CD pipeline offers a robust, automated solution for enforcing Kubernetes policies. With this setup, you can ensure Kubernetes resources are compliant with best practices and security standards before they reach production, enhancing the stability and security of your deployments.

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.

Extending Kyverno Policies: Creating Custom Rules for Kubernetes Security

Extending Kyverno Policies: Creating Custom Rules for Kubernetes Security

Kyverno offers a robust, declarative approach to enforcing security and compliance standards within Kubernetes clusters by allowing users to define and enforce custom policies. For an in-depth look at Kyverno’s functionality, including core concepts and benefits, see my detailed article here. In this guide, we’ll focus on extending Kyverno policies, providing a structured walkthrough of its data model, and illustrating use cases to make the most of Kyverno in a Kubernetes environment.

Understanding the Kyverno Policy Data Model

Kyverno policies consist of several components that define how the policy should behave, which resources it should affect, and the specific rules that apply. Let’s dive into the main parts of the Kyverno policy model:

  1. Policy Definition: This is the root configuration where you define the policy’s metadata, including name, type, and scope. Policies can be created at the namespace level for specific areas or as cluster-wide rules to enforce uniform standards across the entire Kubernetes cluster.
  2. Rules: Policies are made up of rules that dictate what conditions Kyverno should enforce. Each rule can include logic for validation, mutation, or generation based on your needs.
  3. Match and Exclude Blocks: These sections allow fine-grained control over which resources the policy applies to. You can specify resources by their kinds (e.g., Pods, Deployments), namespaces, labels, and even specific names. This flexibility is crucial for creating targeted policies that impact only the resources you want to manage.
    1. Match block: Defines the conditions under which the rule applies to specific resources.
    2. Exclude block: Used to explicitly omit resources that match certain conditions, ensuring that unaffected resources are not inadvertently included.
  4. Validation, Mutation, and Generation Actions: Each rule can take different types of actions:
    1. Validation: Ensures resources meet specific criteria and blocks deployment if they don’t.
    2. Mutation: Adjusts resource configurations to align with predefined standards, which is useful for auto-remediation.
    3. Generation: Creates or manages additional resources based on existing resource configurations.

Example: Restricting Container Image Sources to Docker Hub

A common security requirement is to limit container images to trusted registries. The example below demonstrates a policy that only permits images from Docker Hub.

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: restrict-dockerhub-images
spec:
  rules:
    - name: only-dockerhub-images
      match:
        resources:
          kinds:
            - Pod
      validate:
        message: "Only Docker Hub images are allowed."
        pattern:
          spec:
            containers:
              - image: "docker.io/*"

This policy targets all Pod resources in the cluster and enforces a validation rule that restricts the image source to docker.io. If a Pod uses an image outside Docker Hub, Kyverno denies its deployment, reinforcing secure sourcing practices.

Practical Use-Cases for Kyverno Policies

Kyverno policies can handle a variety of Kubernetes management tasks through validation, mutation, and generation. Let’s explore examples for each type to illustrate Kyverno’s versatility:

1. Validation Policies

Validation policies in Kyverno ensure that resources comply with specific configurations or security standards, stopping any non-compliant resources from deploying.

Use-Case: Enforcing Resource Limits for Containers

This example prevents deployments that lack resource limits, ensuring all Pods specify CPU and memory constraints.

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: enforce-resource-limits
spec:
  rules:
    - name: require-resource-limits
      match:
        resources:
          kinds:
            - Pod
      validate:
        message: "Resource limits (CPU and memory) are required for all containers."
        pattern:
          spec:
            containers:
              - resources:
                  limits:
                    cpu: "?*"
                    memory: "?*"

By enforcing resource limits, this policy helps prevent resource contention in the cluster, fostering stable and predictable performance.

2. Mutation Policies

Mutation policies allow Kyverno to automatically adjust configurations in resources to meet compliance requirements. This approach is beneficial for consistent configurations without manual intervention.

Use-Case: Adding Default Labels to Pods

This policy adds a default label, environment: production, to all new Pods that lack this label, ensuring that resources align with organization-wide labeling standards.

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: add-default-label
spec:
  rules:
    - name: add-environment-label
      match:
        resources:
          kinds:
            - Pod
      mutate:
        patchStrategicMerge:
          metadata:
            labels:
              environment: "production"

This mutation policy is an example of how Kyverno can standardize resource configurations at scale by dynamically adding missing information, reducing human error and ensuring labeling consistency.

3. Generation Policies

Generation policies in Kyverno are used to create or update related resources, enhancing Kubernetes automation by responding to specific configurations or needs in real-time.

Use-Case: Automatically Creating a ConfigMap for Each New Namespace

This example policy generates a ConfigMap in every new namespace, setting default configuration values for all resources in that namespace.

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: generate-configmap
spec:
  rules:
    - name: add-default-configmap
      match:
        resources:
          kinds:
            - Namespace
      generate:
        kind: ConfigMap
        name: default-config
        namespace: "{{request.object.metadata.name}}"
        data:
          default-key: "default-value"

This generation policy is triggered whenever a new namespace is created, automatically provisioning a ConfigMap with default settings. This approach is especially useful in multi-tenant environments, ensuring new namespaces have essential configurations in place.

Conclusion

Extending Kyverno policies enables Kubernetes administrators to establish and enforce tailored security and operational practices within their clusters. By leveraging Kyverno’s capabilities in validation, mutation, and generation, you can automate compliance, streamline operations, and reinforce security standards seamlessly.

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.

Prevent Server Information Disclosure in Kubernetes with Istio Service Mesh

Prevent Server Information Disclosure in Kubernetes with Istio Service Mesh

In today’s digital landscape, where data breaches and cyber threats are becoming increasingly sophisticated, ensuring the security of your servers is paramount. One of the critical security concerns that organizations must address is “Server Information Disclosure.” Server Information Disclosure occurs when sensitive information about a server’s configuration, technology stack, or internal structure is inadvertently exposed to unauthorized parties. Hackers can exploit this vulnerability to gain insights into potential weak points and launch targeted attacks. Such breaches can lead to data theft, service disruption, and reputation damage.

Information Disclosure and Istio Service Mesh

One example is the Server HTTP Header, usually included in most of the HTTP responses where you have the server that is providing this response. The values can vary depending on the stack, but matters such as Jetty, Tomcat, or similar ones are usually seen. But also, if you are using a Service Mesh such as Istio, you will see the header with a value of istio-envoy, as you can see here:

Information Disclosure of Server Implementation using Istio Service mesh

As commented, this is of such importance for several levels of security, such as:

  • Data Privacy: Server information leakage can expose confidential data, undermining user trust and violating data privacy regulations such as GDPR and HIPAA.
  • Reduced Attack Surface: By concealing server details, you minimize the attack surface available to potential attackers.
  • Security by Obscurity: While not a foolproof approach, limiting disclosure adds an extra layer of security, making it harder for hackers to gather intelligence.

How to mitigate that with Istio Service Mesh?

When using Istio, we can define different rules to add and remove HTTP headers based on our needs, as you can see in the following documentation here: https://discuss.istio.io/t/remove-header-operation/1692 using simple clauses to the definition of your VirtualService as you can see here:

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: k8snode-virtual-service
spec:
  hosts:
  - "example.com"
  gateways:
  - k8snode-gateway
  http:
    headers:
      response:
        remove:
          - "x-my-fault-source"
  - route:
    - destination:
        host: k8snode-service
        subset: version-1 

Unfortunately, this is not useful for all HTTP headers, especially the “main” ones, so the ones that are not custom added by your workloads but the ones that are mainly used and defined in the HTTP W3C standard https://www.w3.org/Protocols/

So, in the case of the Server HTTP header is a little bit more complex to do, and you need to use an EnvoyFilter, one of the most sophisticated objects part of the Istio Service Mesh. Based on the words in the official Istio documentation, an EnvoyFilter provides a mechanism to customize the Envoy configuration generated by Istio Pilot. So, you can use EnvoyFilter to modify values for certain fields, add specific filters, or even add entirely new listeners, clusters, etc.

EnvoyFilter Implementation to Remove Header

So now that we know that we need to create a custom EnvoyFilter let’s see which one we need to use to remove the Server header and how this is made to get more knowledge about this component. Here you can see the EnvoyFilter for that job:

---
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: gateway-response-remove-headers
  namespace: istio-system
spec:
  workloadSelector:
    labels:
      istio: ingressgateway
  configPatches:
  - applyTo: NETWORK_FILTER
    match:
      context: GATEWAY
      listener:
        filterChain:
          filter:
            name: "envoy.filters.network.http_connection_manager"
    patch:
      operation: MERGE
      value:
        typed_config:
          "@type": "type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager"
          server_header_transformation: PASS_THROUGH
  - applyTo: ROUTE_CONFIGURATION
    match:
      context: GATEWAY
    patch:
      operation: MERGE
      value:
        response_headers_to_remove:
        - "server"

So let’s focus on the parts of the specification of the EnvoyFilter where we can get for one side the usual workloadSelector, to know where this component will be applied, that in this case will be the istio ingressgateway. Then we enter into the configPatches section, that are the sections where we use the customization that we need to do, and in our case, we have two of them:

Both act on the context: GATEWAY and apply to two different objects: NETWORK\_FILTER AND ROUTE\_CONFIGURATION. You can also use filters on sidecars to affect the behavior of them. The first bit what it does is including the custom filter http\_connection\_maanger that allows the manipulation of the HTTP context, including for our primary purpose also the HTTP header, and then we have the section bit that acts on the ROUTE\_CONFIGURATION removing the server header as we can see by using the option response_header_to_remove

Conclusion

As you can see, this is not easy to implement. Still, at the same time, it is evidence of the power and low-level capabilities that you have when using a robust service mesh such as Istio to interact and modify the behavior of any tiny detail that you want for your benefit and, in this case, also to improve and increase the security of your workloads deployed behind the Service Mesh scope.

In the ever-evolving landscape of cybersecurity threats, safeguarding your servers against information disclosure is crucial to protect sensitive data and maintain your organization’s integrity. Istio empowers you to fortify your server security by providing robust tools for traffic management, encryption, and access control.

Remember, the key to adequate server security is a proactive approach that addresses vulnerabilities before they can be exploited. Take the initiative to implement Istio and elevate your server protection.

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.