Kubernetes Security Best Practices: 2026 Production Hardening Guide

Kubernetes Security Best Practices: 2026 Production Hardening Guide

Kubernetes security is not a single feature you enable — it is a layered discipline that spans the control plane, workloads, networking, supply chain, and runtime. Get one layer wrong and the others rarely save you. This guide covers the controls that matter most in production, why each one exists, and how to implement them without breaking your cluster — plus a prioritized roadmap so you know what to do in your first week, not just an undifferentiated list of “best practices.”

Let me start with the part most hardening guides skip: what an actual attack looks like.

Anatomy of a Real Kubernetes Attack Chain

Abstract advice (“apply least privilege”) doesn’t land until you’ve seen how a single misconfiguration cascades. Here is a realistic chain — every step maps to a documented technique in the MITRE ATT&CK for Containers matrix. If you haven’t seen it before, ATT&CK is an industry-standard, openly maintained knowledge base of real-world adversary behavior: a catalogue of how attackers actually operate, organized by goal (initial access, credential access, lateral movement, and so on). It’s the common language security teams use to describe and defend against attacks.

  1. Initial access. An application pod runs a vulnerable image — say, an unpatched dependency with a remote code execution (RCE) flaw, a bug that lets an attacker run arbitrary code on the host process. The attacker gets code execution inside the container. So far, container isolation should contain the blast radius.
  2. Credential access. The pod has automountServiceAccountToken: true (the default). The attacker reads /var/run/secrets/kubernetes.io/serviceaccount/token — a valid API credential, handed to them for free.
  3. Discovery. Using that token, the attacker queries the API server. The ServiceAccount was bound to a convenient cluster-admin role “to unblock a deploy.” Now they can list every Secret in every namespace.
  4. Lateral movement. They read database credentials, cloud provider keys, and other ServiceAccount tokens from Secrets. The flat pod network (no NetworkPolicies) lets them reach internal services directly.
  5. Privilege escalation / escape. They schedule a privileged pod with hostPID and the host filesystem mounted, then break out to the node. From the node, they reach the kubelet and other tenants’ workloads.
  6. Impact. Crypto-mining, data exfiltration, or ransomware across the cluster.

Notice that steps 2 through 5 each had a one-line fix: disable token automount, scope the RBAC, encrypt Secrets / use an external store, apply default-deny NetworkPolicies, enforce Pod Security. Defense in depth means an attacker has to defeat every layer — and most attackers give up when the easy chain breaks. The rest of this guide is those layers, ordered by how much they shrink that chain.

The Kubernetes Attack Surface

Before hardening anything, understand what you are protecting. A Kubernetes cluster has several distinct attack surfaces:

  • API server — The central control plane. Any entity that can reach it with valid credentials can read cluster state, modify workloads, or escalate privileges.
  • etcd — Stores all cluster state in plain text, including Secrets. Direct etcd access is equivalent to root on every node.
  • Nodes — A compromised node can access all Secrets mounted on pods running on it, access the kubelet API, and potentially escape to the hypervisor.
  • Pods — Privileged pods, host-network pods, and pods with excessive capabilities can break container isolation.
  • Supply chain — Malicious images, compromised registries, and unsigned artifacts can introduce attacker-controlled code into your cluster.
  • RBAC — Overly permissive roles allow lateral movement and privilege escalation once an attacker gains any foothold.

Prioritize based on your threat model — a public-facing multi-tenant cluster needs all of these; an internal development cluster can relax some.

The First-Week Hardening Roadmap (Prioritized)

If you inherited a cluster with nothing in place, do not try to do everything at once. Order matters — some controls give huge risk reduction for minimal effort and zero breakage risk, others need careful rollout. This is the sequence I use:

DayControlRisk reductionBreakage risk
1Audit RBAC, remove stray cluster-admin, disable unused SA token automountHighLow
1Enable API server audit loggingMedium (visibility)None
2Pod Security Admission in warn + audit mode (all namespaces)HighNone (warn only)
3Deploy image scanning in CI (Trivy/Grype), fail on CriticalHighLow
4NetworkPolicies in audit-style rollout: default-deny in one namespace firstHighMedium — test DNS!
5Enable etcd encryption at rest / move Secrets to external storeHighLow
6Flip Pod Security Admission to enforce: baseline, then restricted per namespaceHighMedium
7Deploy runtime detection (Falco) + continuous scanning (Trivy Operator)MediumNone

The single most important idea: roll out enforcing controls in observation mode first (warn/audit for Pod Security, default-deny NetworkPolicies in one namespace). You want to discover what breaks in a dashboard, not in an incident.

Tools to automate and report each step

You don’t have to do any of this by hand. Each step has tooling that both applies the control and reports on its state, so you can wire it into CI or a recurring job:

1. RBAC: Least Privilege from Day One

Role-Based Access Control is Kubernetes’ primary authorization mechanism. Most clusters fail at RBAC not because it is misconfigured, but because it is over-permissive by default and nobody reviews it systematically.

Common RBAC Mistakes

  • Binding to cluster-admin for convenience. Almost no workload needs cluster-admin. Use namespaced roles wherever possible.
  • Using * verbs or resources in roles. Wildcard permissions are almost always broader than intended.
  • Not auditing ServiceAccount token usage. Every pod gets a ServiceAccount. Custom workloads often get over-permissive SAs.
  • Forgetting automountServiceAccountToken: false. If a workload does not need to talk to the Kubernetes API, disable token mounting entirely — this single setting breaks step 2 of the attack chain above.

Practical RBAC Patterns

For a workload that only needs to read ConfigMaps in its own namespace:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: configmap-reader
  namespace: my-app
rules:
- apiGroups: [""]
  resources: ["configmaps"]
  verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: my-app-configmap-reader
  namespace: my-app
subjects:
- kind: ServiceAccount
  name: my-app
  namespace: my-app
roleRef:
  kind: Role
  name: configmap-reader
  apiGroup: rbac.authorization.k8s.io

And disable token automount on the workload that doesn’t call the API at all:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: my-app
  namespace: my-app
automountServiceAccountToken: false

Audit existing RBAC with kubectl-who-can or rbac-tool to find overly permissive bindings before attackers do. A useful one-liner: list every subject that can read Secrets cluster-wide with kubectl who-can get secrets.

2. Pod Security Standards (and Migrating off PodSecurityPolicy)

PodSecurityPolicy was deprecated in Kubernetes 1.21 and removed in 1.25. Its replacement is Pod Security Admission (PSA), a built-in admission controller that enforces one of three Pod Security Standards profiles at the namespace level:

  • Privileged — No restrictions. For system components only.
  • Baseline — Prevents the most critical privilege escalations: privileged containers, hostPID, hostIPC, hostNetwork, dangerous capabilities.
  • Restricted — Enforces current hardening best practices. Requires running as non-root, dropping all capabilities, and using a restricted seccomp profile.

Enable enforcement at the namespace level with labels:

apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/enforce-version: v1.30
    pod-security.kubernetes.io/warn: restricted
    pod-security.kubernetes.io/warn-version: v1.30
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/audit-version: v1.30

A pod that runs as root or requests host-network in a namespace enforcing restricted will be rejected at admission. The warn and audit modes let you test before enforcing. For a full walkthrough of how PSA evaluates pods and how to roll it out, see my guide on understanding Pod Security Admission.

Migrating from PodSecurityPolicy to PSA

If you’re still on a cluster that used PSP, the migration path is:

  1. Map your PSPs to the closest PSA level. Most “restricted” PSPs map to restricted; permissive ones to baseline. The official pspmigrator tool can suggest mappings.
  2. Label every namespace in warn/audit mode matching that level — no enforcement yet.
  3. Watch the audit logs and warnings for a release cycle. Fix the workloads that would be rejected (add securityContext, drop capabilities).
  4. Flip to enforce namespace by namespace, starting with the least critical.

PSA is intentionally coarse-grained — three levels, namespace-scoped. For anything finer (per-team registries, required labels, custom mutation), you need a policy engine, which is the next section.

3. Policy Engines: Kyverno vs OPA Gatekeeper

Once you outgrow PSA’s three levels, you need an admission policy engine. The two standards are Kyverno and OPA Gatekeeper, and choosing between them is one of the most common platform decisions.

KyvernoOPA Gatekeeper
Policy languageYAML (Kubernetes-native)Rego (purpose-built DSL)
Learning curveLow — looks like other manifestsSteep — Rego is its own paradigm
Mutation supportYes, first-classLimited
Image verification (Cosign)Built-inVia external data
Best whenTeam wants fast adoption, K8s-onlyTeam already runs OPA across the stack

For most teams without existing Rego expertise, Kyverno is significantly faster to adopt and maintain. A Kyverno policy to require all images come from your private registry:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: restrict-image-registries
spec:
  validationFailureAction: Enforce
  rules:
  - name: validate-registries
    match:
      any:
      - resources:
          kinds: ["Pod"]
    validate:
      message: "Images must come from registry.company.com"
      pattern:
        spec:
          containers:
          - image: "registry.company.com/*"

Both integrate cleanly with GitOps — store policies in Git, apply via Argo CD or Flux, and you get an auditable history of every policy change. I’ve written several deep dives on this: Kyverno: enforcing standard and custom policies, extending Kyverno with custom rules, and running the Kyverno CLI in CI/CD with GitHub Actions — or browse everything under the policies tag.

4. Network Policies: Micro-Segmentation

By default, every pod in a Kubernetes cluster can communicate with every other pod across all namespaces. This flat network model gives attackers unrestricted lateral movement once they compromise any workload (step 4 of the attack chain).

Network Policies define L3/L4 allow-rules for pod-to-pod communication. They are enforced by your CNI (Container Network Interface) plugin (Calico, Cilium, Weave — not Flannel, which does not support NetworkPolicy).

Default Deny Pattern

Start by denying all ingress and egress in a namespace, then open only what is explicitly needed:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

Then allow specific traffic:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-api-to-db
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: postgres
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: api
    ports:
    - protocol: TCP
      port: 5432

The DNS Trap (the #1 reason default-deny “breaks everything”)

The most common NetworkPolicy support ticket: “I applied default-deny and the whole namespace stopped working.” The cause is almost always DNS. A default-deny egress policy blocks the pod from reaching kube-dns, so every name resolution fails and applications appear to hang or crash-loop.

Always pair default-deny egress with an explicit DNS allow rule:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-dns
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Egress
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: kube-system
    ports:
    - protocol: UDP
      port: 53
    - protocol: TCP
      port: 53

Roll default-deny out in one namespace first, confirm DNS and required egress work, then expand. Tools like Cilium’s Hubble or Calico’s flow logs make it much easier to see exactly which flows you need to allow.

5. Secrets Management

Kubernetes Secrets are base64-encoded, not encrypted. They are stored in etcd in plain text by default. Anyone with get permission on Secrets can read them. This is not a vulnerability — it is a design decision that puts the responsibility on you to:

  • Enable encryption at rest for etcd. Configure EncryptionConfiguration with an AES-CBC or AES-GCM provider so Secrets are encrypted before being written to etcd.
  • Use external secret stores. HashiCorp Vault, AWS Secrets Manager, or Azure Key Vault with the External Secrets Operator means actual secret values never live in Kubernetes at all.
  • Restrict Secret RBAC aggressively. Never give list on Secrets cluster-wide — it returns all values. Use get on named resources where possible.
  • Avoid environment variables for secrets. Prefer volume mounts. Env vars are visible in pod inspect output and can leak through application logging.
# etcd encryption at rest - in kube-apiserver config
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
- resources:
  - secrets
  providers:
  - aescbc:
      keys:
      - name: key1
        secret: <base64-encoded-32-byte-key>
  - identity: {}

For the full external-store pattern, see the guide on injecting secrets into pods with HashiCorp Vault.

6. Image Security and Supply Chain

Your runtime security posture is only as good as the images you run. A compromised image from a public registry bypasses every runtime control you have.

Scan images in CI

Use Trivy, Grype, or Snyk to scan images as part of CI. Block deployments of images with critical CVEs (Common Vulnerabilities and Exposures — publicly catalogued security flaws). I’ve covered the practical side of this in scanning Docker images with Trivy, scanning your images locally before they ship, and a broader roundup of open-source development security tools.

# In your CI pipeline
trivy image --exit-code 1 --severity CRITICAL your-image:tag

Use a private registry with admission control

Only allow images from your private registry using an admission webhook (Kyverno, OPA Gatekeeper) — the policy in section 3 does exactly this. It prevents developers from running arbitrary public images in production.

Use distroless or minimal base images

Distroless images contain only the application and its runtime dependencies — no shell, no package manager, no debugging tools. This drastically reduces both the attack surface and the CVE count. Google’s distroless images are available for Java, Node.js, Python, and Go. (Related: debugging distroless containers when you do need to inspect one.)

Sign and verify images (and the SLSA angle)

Cosign (from the Sigstore project) lets you sign container images and verify signatures at admission time using Kyverno or Connaisseur. This prevents image-substitution attacks where an attacker replaces a legitimate image in your registry.

If you’re being asked about supply-chain compliance, the framework to know is SLSA (Supply-chain Levels for Software Artifacts). The practical progression: SLSA L1 = you have a build provenance document; L2 = it’s signed and the build is hosted; L3 = the build is hardened and non-falsifiable. Generating provenance with your CI (GitHub Actions has native SLSA generators) and verifying it at admission with Cosign + Kyverno gets you most of the way to L2/L3 without a platform rebuild.

7. Runtime Security

Runtime security detects and responds to malicious activity after a container is running. The primary tool is Falco — a CNCF project that uses eBPF (extended Berkeley Packet Filter — a Linux kernel technology for running sandboxed observability programs) to monitor system calls and raise alerts when containers behave unexpectedly.

Default Falco rules catch common attack patterns:

  • Shell spawned in a container
  • Network connection to an unexpected IP
  • Write to a sensitive file path (/etc/passwd, /etc/shadow)
  • Privilege escalation via setuid binaries
  • Container drift (new executable files written at runtime)

Combine Falco with seccomp profiles to restrict the system calls a container can make at the kernel level. The RuntimeDefault seccomp profile (a default option since Kubernetes 1.27) blocks 300+ system calls that containers virtually never need.

spec:
  securityContext:
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: app
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      runAsNonRoot: true
      runAsUser: 65534
      capabilities:
        drop: ["ALL"]

These four securityContext settings together (allowPrivilegeEscalation: false, readOnlyRootFilesystem: true, runAsNonRoot: true, capabilities.drop: ALL) make container escape significantly harder and satisfy the Kubernetes Restricted pod security standard. They directly close step 5 of the attack chain.

8. API Server Hardening

The API server is the most critical component to harden. Key settings:

  • Disable anonymous authentication. --anonymous-auth=false ensures every request is authenticated.
  • Enable audit logging. Log all API server requests to a file or webhook. Without audit logs, you cannot investigate incidents or detect RBAC abuse.
  • Restrict admission plugins. Ensure NodeRestriction is enabled — it prevents node kubelets from modifying objects outside their own node.
  • Do not expose the API server to the internet. Use a VPN, bastion host, or private endpoint. If you must expose it, restrict access by IP.
# Minimal audit policy - log all requests at metadata level,
# and full request body for sensitive resources
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: RequestResponse
  resources:
  - group: ""
    resources: ["secrets", "configmaps"]
- level: Metadata
  omitStages: ["RequestReceived"]

9. etcd Security

etcd stores all cluster state. Treat it as sensitive as your production database:

  • Enable TLS for all etcd communication — both peer (etcd-to-etcd) and client (apiserver-to-etcd) with mutual TLS.
  • Restrict network access to etcd. It should only be reachable by the API server. Use firewall rules or security groups.
  • Enable encryption at rest (see Secrets section).
  • Back up etcd regularly. A snapshot is a complete copy of all cluster state, including all Secrets. Encrypt backups and store them separately from the cluster.

10. Multi-Tenancy Isolation

If multiple teams or customers share a cluster, namespace boundaries alone are not a security boundary — they’re an organizational one. Hardening multi-tenant clusters adds requirements on top of everything above:

  • Namespace-per-tenant with ResourceQuotas and LimitRanges to prevent noisy-neighbor and resource-exhaustion DoS.
  • NetworkPolicies that deny cross-namespace traffic by default, so tenant A cannot reach tenant B’s pods.
  • A policy engine enforcing per-tenant rules (allowed registries, required labels, no hostPath).
  • Separate node pools for untrusted workloads, or a sandboxed runtime (gVisor, Kata Containers) when you run genuinely untrusted code.

For hard multi-tenancy (untrusted tenants), the honest answer is that vanilla namespaces aren’t enough — consider virtual clusters (vCluster) or separate clusters entirely. Soft multi-tenancy (trusted internal teams) is well served by the controls in this guide.

11. Benchmarks and Continuous Posture

CIS Kubernetes Benchmark

The CIS Kubernetes Benchmark is a comprehensive checklist covering the control plane, nodes, and workloads. Running kube-bench gives you a scored assessment:

kubectl apply -f https://raw.githubusercontent.com/aquasecurity/kube-bench/main/job.yaml
kubectl logs $(kubectl get pods -l app=kube-bench -o name)

kube-bench outputs PASS/FAIL/WARN for each control with remediation guidance. Run it after initial cluster setup and after major configuration changes.

Continuous scanning with Trivy Operator / Kubescape

Kubescape and the Trivy Operator provide continuous security scanning of live cluster state — not just a one-time audit. They check workloads against NSA/CISA hardening guidelines, the MITRE ATT&CK framework, and the CIS benchmark in real time.

helm repo add aquasecurity https://aquasecurity.github.io/helm-charts/
helm install trivy-operator aquasecurity/trivy-operator \
  --namespace trivy-system \
  --create-namespace \
  --set="trivy.ignoreUnfixed=true"

Trivy Operator creates VulnerabilityReport, ConfigAuditReport, and RbacAssessmentReport custom resources alongside each workload. Scrape them with Prometheus and build a security dashboard in Grafana.

Security Hardening Checklist

  • ✅ RBAC reviewed — no wildcard roles, no unnecessary cluster-admin bindings
  • ✅ ServiceAccount token automount disabled for workloads that do not need API access
  • ✅ Pod Security Standards enforced at namespace level (at least Baseline, Restricted where possible)
  • ✅ Policy engine (Kyverno/Gatekeeper) enforcing registry, label, and mutation rules
  • ✅ Network policies deployed — default deny with explicit allows (including DNS!)
  • ✅ Secrets encrypted at rest in etcd or moved to an external store
  • ✅ Images scanned in CI — no critical CVEs in production
  • ✅ Private registry enforced via admission control
  • ✅ Image signing + verification (Cosign) and build provenance (SLSA)
  • ✅ Container securityContext hardened (non-root, read-only fs, no capabilities)
  • ✅ seccomp RuntimeDefault profile enabled
  • ✅ API server audit logging enabled, anonymous auth disabled
  • ✅ etcd TLS and network access restricted
  • ✅ Multi-tenancy isolation (quotas, cross-namespace deny) if shared
  • ✅ kube-bench run and critical/high findings remediated
  • ✅ Runtime security (Falco) deployed and alerts routed to on-call
  • ✅ Continuous scanning (Trivy Operator or Kubescape) deployed

FAQ

Where do I start if my cluster has no security controls today?

Follow the first-week roadmap above. The short version: audit RBAC (revoke stray cluster-admin), enable Pod Security Admission in warn mode on all namespaces, and deploy image scanning + Trivy Operator. These give immediate visibility and stop the most common privilege escalations without breaking anything.

Does enabling Network Policies break DNS resolution?

Yes — this is the single most common failure. A default-deny egress policy blocks pods from reaching kube-dns, so name resolution fails. Add an egress rule allowing UDP and TCP port 53 to the kube-system namespace whenever you apply default-deny (see the DNS allow policy above).

Should I use OPA Gatekeeper or Kyverno?

Both enforce admission policies. Kyverno is Kubernetes-native (policies are YAML) while Gatekeeper uses Rego. For teams without Rego expertise, Kyverno is faster to adopt and supports mutation and Cosign verification out of the box. Choose Gatekeeper if you already run OPA elsewhere and want one policy language across your stack.

What replaced PodSecurityPolicy?

Pod Security Admission (PSA), built into Kubernetes since 1.25. It enforces three profiles (privileged/baseline/restricted) via namespace labels. For finer-grained control than PSA’s three levels, add Kyverno or Gatekeeper.

Is Kubernetes certified for PCI-DSS or SOC 2?

Kubernetes itself is not certified — your configuration and the controls you implement determine compliance. The CIS Kubernetes Benchmark maps to many PCI-DSS and SOC 2 requirements. Managed offerings (EKS, GKE, AKS) carry their own compliance certifications for the underlying infrastructure.

How often should I update Kubernetes for security patches?

Apply a patch release within 30 days for High/Critical CVEs. Minor version upgrades (e.g., 1.30 → 1.31) should happen within the support window — Kubernetes maintains the last three minor versions. Falling more than one minor behind means running without patches for a growing subset of the codebase.

Are namespaces a security boundary?

No. Namespaces are an organizational boundary. Real isolation between tenants requires NetworkPolicies, ResourceQuotas, a policy engine, and — for untrusted workloads — sandboxed runtimes (gVisor/Kata) or separate clusters.


For a deeper look at how security fits into the broader Kubernetes platform architecture, see the Kubernetes architecture patterns guide and the guide on building a security-first Kubernetes culture.

Integrate Kyverno CLI into CI/CD Pipelines with GitHub Actions for Kubernetes Policy Checks

Integrate Kyverno CLI into CI/CD Pipelines with GitHub Actions for Kubernetes Policy Checks

Introduction

As Kubernetes clusters become an integral part of infrastructure, maintaining compliance with security and configuration policies is crucial. Kyverno, a policy engine designed for Kubernetes, can be integrated into your CI/CD pipelines to enforce configuration standards and automate policy checks. In this article, we’ll walk through integrating Kyverno CLI with GitHub Actions, providing a seamless workflow for validating Kubernetes manifests before they reach your cluster.

What is Kyverno CLI?

Kyverno is a Kubernetes-native policy management tool, enabling users to enforce best practices, security protocols, and compliance across clusters. Kyverno CLI is a command-line interface that lets you apply, test, and validate policies against YAML manifests locally or in CI/CD pipelines. By integrating Kyverno CLI with GitHub Actions, you can automate these policy checks, ensuring code quality and compliance before deploying resources to Kubernetes.

Benefits of Using Kyverno CLI in CI/CD Pipelines

Integrating Kyverno into your CI/CD workflow provides several advantages:

  1. Automated Policy Validation: Detect policy violations early in the CI/CD pipeline, preventing misconfigured resources from deployment.
  2. Enhanced Security Compliance: Kyverno enables checks for security best practices and compliance frameworks.
  3. Faster Development: Early feedback on policy violations streamlines the process, allowing developers to fix issues promptly.

Setting Up Kyverno CLI in GitHub Actions

Step 1: Install Kyverno CLI

To use Kyverno in your pipeline, you need to install the Kyverno CLI in your GitHub Actions workflow. You can specify the Kyverno version required for your project or use the latest version.

Here’s a sample GitHub Actions YAML configuration to install Kyverno CLI:

name: CI Pipeline with Kyverno Policy Checks

on:
  push:
    branches:
      - main
  pull_request:
    branches:
      - main

jobs:
  kyverno-policy-check:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout Code
        uses: actions/checkout@v2

      - name: Install Kyverno CLI
        run: |
          curl -LO https://github.com/kyverno/kyverno/releases/download/v<version>/kyverno-cli-linux.tar.gz
          tar -xzf kyverno-cli-linux.tar.gz
          sudo mv kyverno /usr/local/bin/

Replace <version> with the version of Kyverno CLI you wish to use. Alternatively, you can replace it with latest to always fetch the latest release.

Step 2: Define Policies for Validation

Create a directory in your repository to store Kyverno policies. These policies define the standards that your Kubernetes resources should comply with. For example, create a directory structure as follows:

.
└── .github
    └── policies
        ├── disallow-latest-tag.yaml
        └── require-requests-limits.yaml

Each policy is defined in YAML format and can be customized to meet specific requirements. Below are examples of policies that might be used:

  • Disallow latest Tag in Images: Prevents the use of the latest tag to ensure version consistency.
  • Enforce CPU/Memory Limits: Ensures resource limits are set for containers, which can prevent resource abuse.

Step 3: Add a GitHub Actions Step to Validate Manifests

In this step, you’ll use Kyverno CLI to validate Kubernetes manifests against the policies defined in the .github/policies directory. If a manifest fails validation, the pipeline will halt, preventing non-compliant resources from being deployed.

Here’s the YAML configuration to validate manifests:

- name: Validate Kubernetes Manifests
  run: |
    kyverno apply .github/policies -r manifests/

Replace manifests/ with the path to your Kubernetes manifests in the repository. This command applies all policies in .github/policies against each YAML file in the manifests directory, stopping the pipeline if any non-compliant configurations are detected.

Step 4: Handle Validation Results

To make the output of Kyverno CLI more readable, you can use additional GitHub Actions steps to format and handle the results. For instance, you might set up a conditional step to notify the team if any manifest is non-compliant:

- name: Check for Policy Violations
  if: failure()
  run: echo "Policy violation detected. Please review the failed validation."

Alternatively, you could configure notifications to alert your team through Slack, email, or other integrations whenever a policy violation is identified.

Example: Validating a Kubernetes Manifest

Suppose you have a manifest defining a Kubernetes deployment as follows:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
        - name: nginx
          image: nginx:latest  # Should trigger a violation

The policy disallow-latest-tag.yaml checks if any container image uses the latest tag and rejects it. When this manifest is processed, Kyverno CLI flags the image and halts the CI/CD pipeline with an error, preventing the deployment of this manifest until corrected.

Conclusion

Integrating Kyverno CLI into a GitHub Actions CI/CD pipeline offers a robust, automated solution for enforcing Kubernetes policies. With this setup, you can ensure Kubernetes resources are compliant with best practices and security standards before they reach production, enhancing the stability and security of your deployments.

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.

Extending Kyverno Policies: Creating Custom Rules for Kubernetes Security

Extending Kyverno Policies: Creating Custom Rules for Kubernetes Security

Kyverno offers a robust, declarative approach to enforcing security and compliance standards within Kubernetes clusters by allowing users to define and enforce custom policies. For an in-depth look at Kyverno’s functionality, including core concepts and benefits, see my detailed article here. In this guide, we’ll focus on extending Kyverno policies, providing a structured walkthrough of its data model, and illustrating use cases to make the most of Kyverno in a Kubernetes environment.

Understanding the Kyverno Policy Data Model

Kyverno policies consist of several components that define how the policy should behave, which resources it should affect, and the specific rules that apply. Let’s dive into the main parts of the Kyverno policy model:

  1. Policy Definition: This is the root configuration where you define the policy’s metadata, including name, type, and scope. Policies can be created at the namespace level for specific areas or as cluster-wide rules to enforce uniform standards across the entire Kubernetes cluster.
  2. Rules: Policies are made up of rules that dictate what conditions Kyverno should enforce. Each rule can include logic for validation, mutation, or generation based on your needs.
  3. Match and Exclude Blocks: These sections allow fine-grained control over which resources the policy applies to. You can specify resources by their kinds (e.g., Pods, Deployments), namespaces, labels, and even specific names. This flexibility is crucial for creating targeted policies that impact only the resources you want to manage.
    1. Match block: Defines the conditions under which the rule applies to specific resources.
    2. Exclude block: Used to explicitly omit resources that match certain conditions, ensuring that unaffected resources are not inadvertently included.
  4. Validation, Mutation, and Generation Actions: Each rule can take different types of actions:
    1. Validation: Ensures resources meet specific criteria and blocks deployment if they don’t.
    2. Mutation: Adjusts resource configurations to align with predefined standards, which is useful for auto-remediation.
    3. Generation: Creates or manages additional resources based on existing resource configurations.

Example: Restricting Container Image Sources to Docker Hub

A common security requirement is to limit container images to trusted registries. The example below demonstrates a policy that only permits images from Docker Hub.

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: restrict-dockerhub-images
spec:
  rules:
    - name: only-dockerhub-images
      match:
        resources:
          kinds:
            - Pod
      validate:
        message: "Only Docker Hub images are allowed."
        pattern:
          spec:
            containers:
              - image: "docker.io/*"

This policy targets all Pod resources in the cluster and enforces a validation rule that restricts the image source to docker.io. If a Pod uses an image outside Docker Hub, Kyverno denies its deployment, reinforcing secure sourcing practices.

Practical Use-Cases for Kyverno Policies

Kyverno policies can handle a variety of Kubernetes management tasks through validation, mutation, and generation. Let’s explore examples for each type to illustrate Kyverno’s versatility:

1. Validation Policies

Validation policies in Kyverno ensure that resources comply with specific configurations or security standards, stopping any non-compliant resources from deploying.

Use-Case: Enforcing Resource Limits for Containers

This example prevents deployments that lack resource limits, ensuring all Pods specify CPU and memory constraints.

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: enforce-resource-limits
spec:
  rules:
    - name: require-resource-limits
      match:
        resources:
          kinds:
            - Pod
      validate:
        message: "Resource limits (CPU and memory) are required for all containers."
        pattern:
          spec:
            containers:
              - resources:
                  limits:
                    cpu: "?*"
                    memory: "?*"

By enforcing resource limits, this policy helps prevent resource contention in the cluster, fostering stable and predictable performance.

2. Mutation Policies

Mutation policies allow Kyverno to automatically adjust configurations in resources to meet compliance requirements. This approach is beneficial for consistent configurations without manual intervention.

Use-Case: Adding Default Labels to Pods

This policy adds a default label, environment: production, to all new Pods that lack this label, ensuring that resources align with organization-wide labeling standards.

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: add-default-label
spec:
  rules:
    - name: add-environment-label
      match:
        resources:
          kinds:
            - Pod
      mutate:
        patchStrategicMerge:
          metadata:
            labels:
              environment: "production"

This mutation policy is an example of how Kyverno can standardize resource configurations at scale by dynamically adding missing information, reducing human error and ensuring labeling consistency.

3. Generation Policies

Generation policies in Kyverno are used to create or update related resources, enhancing Kubernetes automation by responding to specific configurations or needs in real-time.

Use-Case: Automatically Creating a ConfigMap for Each New Namespace

This example policy generates a ConfigMap in every new namespace, setting default configuration values for all resources in that namespace.

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: generate-configmap
spec:
  rules:
    - name: add-default-configmap
      match:
        resources:
          kinds:
            - Namespace
      generate:
        kind: ConfigMap
        name: default-config
        namespace: "{{request.object.metadata.name}}"
        data:
          default-key: "default-value"

This generation policy is triggered whenever a new namespace is created, automatically provisioning a ConfigMap with default settings. This approach is especially useful in multi-tenant environments, ensuring new namespaces have essential configurations in place.

Conclusion

Extending Kyverno policies enables Kubernetes administrators to establish and enforce tailored security and operational practices within their clusters. By leveraging Kyverno’s capabilities in validation, mutation, and generation, you can automate compliance, streamline operations, and reinforce security standards seamlessly.

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.

Prevent Server Information Disclosure in Kubernetes with Istio Service Mesh

Prevent Server Information Disclosure in Kubernetes with Istio Service Mesh

In today’s digital landscape, where data breaches and cyber threats are becoming increasingly sophisticated, ensuring the security of your servers is paramount. One of the critical security concerns that organizations must address is “Server Information Disclosure.” Server Information Disclosure occurs when sensitive information about a server’s configuration, technology stack, or internal structure is inadvertently exposed to unauthorized parties. Hackers can exploit this vulnerability to gain insights into potential weak points and launch targeted attacks. Such breaches can lead to data theft, service disruption, and reputation damage.

Information Disclosure and Istio Service Mesh

One example is the Server HTTP Header, usually included in most of the HTTP responses where you have the server that is providing this response. The values can vary depending on the stack, but matters such as Jetty, Tomcat, or similar ones are usually seen. But also, if you are using a Service Mesh such as Istio, you will see the header with a value of istio-envoy, as you can see here:

Information Disclosure of Server Implementation using Istio Service mesh

As commented, this is of such importance for several levels of security, such as:

  • Data Privacy: Server information leakage can expose confidential data, undermining user trust and violating data privacy regulations such as GDPR and HIPAA.
  • Reduced Attack Surface: By concealing server details, you minimize the attack surface available to potential attackers.
  • Security by Obscurity: While not a foolproof approach, limiting disclosure adds an extra layer of security, making it harder for hackers to gather intelligence.

How to mitigate that with Istio Service Mesh?

When using Istio, we can define different rules to add and remove HTTP headers based on our needs, as you can see in the following documentation here: https://discuss.istio.io/t/remove-header-operation/1692 using simple clauses to the definition of your VirtualService as you can see here:

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: k8snode-virtual-service
spec:
  hosts:
  - "example.com"
  gateways:
  - k8snode-gateway
  http:
    headers:
      response:
        remove:
          - "x-my-fault-source"
  - route:
    - destination:
        host: k8snode-service
        subset: version-1 

Unfortunately, this is not useful for all HTTP headers, especially the “main” ones, so the ones that are not custom added by your workloads but the ones that are mainly used and defined in the HTTP W3C standard https://www.w3.org/Protocols/

So, in the case of the Server HTTP header is a little bit more complex to do, and you need to use an EnvoyFilter, one of the most sophisticated objects part of the Istio Service Mesh. Based on the words in the official Istio documentation, an EnvoyFilter provides a mechanism to customize the Envoy configuration generated by Istio Pilot. So, you can use EnvoyFilter to modify values for certain fields, add specific filters, or even add entirely new listeners, clusters, etc.

EnvoyFilter Implementation to Remove Header

So now that we know that we need to create a custom EnvoyFilter let’s see which one we need to use to remove the Server header and how this is made to get more knowledge about this component. Here you can see the EnvoyFilter for that job:

---
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: gateway-response-remove-headers
  namespace: istio-system
spec:
  workloadSelector:
    labels:
      istio: ingressgateway
  configPatches:
  - applyTo: NETWORK_FILTER
    match:
      context: GATEWAY
      listener:
        filterChain:
          filter:
            name: "envoy.filters.network.http_connection_manager"
    patch:
      operation: MERGE
      value:
        typed_config:
          "@type": "type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager"
          server_header_transformation: PASS_THROUGH
  - applyTo: ROUTE_CONFIGURATION
    match:
      context: GATEWAY
    patch:
      operation: MERGE
      value:
        response_headers_to_remove:
        - "server"

So let’s focus on the parts of the specification of the EnvoyFilter where we can get for one side the usual workloadSelector, to know where this component will be applied, that in this case will be the istio ingressgateway. Then we enter into the configPatches section, that are the sections where we use the customization that we need to do, and in our case, we have two of them:

Both act on the context: GATEWAY and apply to two different objects: NETWORK\_FILTER AND ROUTE\_CONFIGURATION. You can also use filters on sidecars to affect the behavior of them. The first bit what it does is including the custom filter http\_connection\_maanger that allows the manipulation of the HTTP context, including for our primary purpose also the HTTP header, and then we have the section bit that acts on the ROUTE\_CONFIGURATION removing the server header as we can see by using the option response_header_to_remove

Conclusion

As you can see, this is not easy to implement. Still, at the same time, it is evidence of the power and low-level capabilities that you have when using a robust service mesh such as Istio to interact and modify the behavior of any tiny detail that you want for your benefit and, in this case, also to improve and increase the security of your workloads deployed behind the Service Mesh scope.

In the ever-evolving landscape of cybersecurity threats, safeguarding your servers against information disclosure is crucial to protect sensitive data and maintain your organization’s integrity. Istio empowers you to fortify your server security by providing robust tools for traffic management, encryption, and access control.

Remember, the key to adequate server security is a proactive approach that addresses vulnerabilities before they can be exploited. Take the initiative to implement Istio and elevate your server protection.

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.