Kubernetes Security Best Practices: A Production Hardening Guide

Kubernetes Security Best Practices: A Production Hardening Guide

Kubernetes security is not a single feature you enable — it is a layered discipline that spans the control plane, workloads, networking, supply chain, and runtime. This guide covers the security controls that matter most in production, why each one exists, and how to implement them without breaking your cluster.

The Kubernetes Attack Surface

Before hardening anything, understand what you are protecting. A Kubernetes cluster has several distinct attack surfaces:

  • API server — The central control plane. Any entity that can reach it with valid credentials can read cluster state, modify workloads, or escalate privileges.
  • etcd — Stores all cluster state in plain text, including Secrets. Direct etcd access is equivalent to root on every node.
  • Nodes — A compromised node can access all Secrets mounted on pods running on it, access the kubelet API, and potentially escape to the hypervisor.
  • Pods — Privileged pods, host-network pods, and pods with excessive capabilities can break container isolation.
  • Supply chain — Malicious images, compromised registries, and unsigned artifacts can introduce attacker-controlled code into your cluster.
  • RBAC — Overly permissive roles allow lateral movement and privilege escalation once an attacker gains any foothold.

The controls below address each of these surfaces. Prioritize based on your threat model — a public-facing multi-tenant cluster needs all of them; an internal development cluster can relax some.

1. RBAC: Least Privilege from Day One

Role-Based Access Control is Kubernetes’ primary authorization mechanism. Most clusters fail at RBAC not because it is misconfigured, but because it is over-permissive by default and nobody reviews it systematically.

Common RBAC Mistakes

  • Binding to cluster-admin for convenience. Almost no workload needs cluster-admin. Use namespaced roles wherever possible.
  • Using * verbs or resources in roles. Wildcard permissions are almost always broader than intended.
  • Not auditing ServiceAccount token usage. Every pod gets a ServiceAccount. The default ServiceAccount in most namespaces has no permissions, but custom workloads often get over-permissive SAs.
  • Forgetting automountServiceAccountToken: false. If a workload does not need to talk to the Kubernetes API, disable token mounting entirely.

Practical RBAC Patterns

For a workload that only needs to read ConfigMaps in its own namespace:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: configmap-reader
  namespace: my-app
rules:
- apiGroups: [""]
  resources: ["configmaps"]
  verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: my-app-configmap-reader
  namespace: my-app
subjects:
- kind: ServiceAccount
  name: my-app
  namespace: my-app
roleRef:
  kind: Role
  name: configmap-reader
  apiGroup: rbac.authorization.k8s.io

Audit existing RBAC with kubectl-who-can or rbac-tool to find overly permissive bindings before attackers do.

2. Pod Security Standards

PodSecurityPolicy was deprecated in Kubernetes 1.21 and removed in 1.25. Its replacement is Pod Security Admission (PSA), a built-in admission controller that enforces one of three security profiles at the namespace level:

  • Privileged — No restrictions. For system components only.
  • Baseline — Prevents the most critical privilege escalations: privileged containers, hostPID, hostIPC, hostNetwork, dangerous capabilities.
  • Restricted — Enforces current hardening best practices. Requires running as non-root, dropping all capabilities, and using a restricted seccomp profile.

Enable enforcement at the namespace level with labels:

apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/enforce-version: v1.30
    pod-security.kubernetes.io/warn: restricted
    pod-security.kubernetes.io/warn-version: v1.30
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/audit-version: v1.30

A pod that runs as root or requests host-network in a namespace enforcing restricted will be rejected at admission. The warn and audit modes let you test before enforcing.

PSA covers the most critical pod-level escalations, but it is coarse-grained. For fine-grained policy control, use Kyverno alongside PSA.

3. Network Policies: Micro-Segmentation

By default, every pod in a Kubernetes cluster can communicate with every other pod across all namespaces. This is a flat network model that gives attackers unrestricted lateral movement once they compromise any workload.

Network Policies define L3/L4 allow-rules for pod-to-pod communication. They are enforced by your CNI plugin (Calico, Cilium, Weave — not Flannel, which does not support NetworkPolicy).

Default Deny Pattern

Start by denying all ingress and egress in every namespace, then open only what is explicitly needed:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

Then allow specific traffic:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-api-to-db
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: postgres
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: api
    ports:
    - protocol: TCP
      port: 5432

Do not forget DNS egress — most workloads need to resolve names via kube-dns, which requires UDP port 53 egress to the kube-system namespace.

4. Secrets Management

Kubernetes Secrets are base64-encoded, not encrypted. Stored in etcd in plain text by default. Anyone with get permission on Secrets can read them. This is not a vulnerability — it is a design decision that puts the responsibility on you to:

  • Enable encryption at rest for etcd. Configure EncryptionConfiguration with an AES-CBC or AES-GCM provider. This encrypts Secrets before they are written to etcd.
  • Use external secret stores. HashiCorp Vault, AWS Secrets Manager, or Azure Key Vault with the External Secrets Operator means actual secret values never live in Kubernetes at all.
  • Restrict Secret RBAC aggressively. Never give list on Secrets cluster-wide — it returns all values. Use get on named resources where possible.
  • Avoid environment variables for secrets. Prefer volume mounts. Env vars are visible in pod inspect output and can leak through application logging.
# etcd encryption at rest - in kube-apiserver config
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
- resources:
  - secrets
  providers:
  - aescbc:
      keys:
      - name: key1
        secret: <base64-encoded-32-byte-key>
  - identity: {}

5. Image Security and Supply Chain

Your runtime security posture is only as good as the images you run. A compromised image from a public registry bypasses every runtime control you have.

Scan images in CI

Use Trivy, Grype, or Snyk to scan images as part of your CI pipeline. Block deployments of images with critical CVEs:

# In your CI pipeline
trivy image --exit-code 1 --severity CRITICAL your-image:tag

Use a private registry with admission control

Only allow images from your private registry using an admission webhook (Kyverno, OPA Gatekeeper). This prevents developers from running arbitrary public images in production:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: restrict-image-registries
spec:
  validationFailureAction: Enforce
  rules:
  - name: validate-registries
    match:
      any:
      - resources:
          kinds: ["Pod"]
    validate:
      message: "Images must come from registry.company.com"
      pattern:
        spec:
          containers:
          - image: "registry.company.com/*"

Use distroless or minimal base images

Distroless images contain only the application and its runtime dependencies — no shell, no package manager, no debugging tools. This drastically reduces the attack surface and the number of CVEs. Google’s distroless images are available for Java, Node.js, Python, and Go.

Sign and verify images

Cosign (from the Sigstore project) lets you sign container images and verify signatures at admission time using Kyverno or Connaisseur. This prevents image substitution attacks where an attacker replaces a legitimate image in your registry.

6. Runtime Security

Runtime security detects and responds to malicious activity after a container is running. The primary tool in this space is Falco — a CNCF project that uses eBPF to monitor system calls and raise alerts when containers behave unexpectedly.

Default Falco rules catch common attack patterns:

  • Shell spawned in a container
  • Network connection to an unexpected IP
  • Write to a sensitive file path (/etc/passwd, /etc/shadow)
  • Privilege escalation via setuid binaries
  • Container drift (new executable files written at runtime)

Combine Falco with seccomp profiles to restrict the system calls a container can make at the kernel level. The RuntimeDefault seccomp profile (available since Kubernetes 1.27 as a default) blocks 300+ system calls that containers virtually never need.

spec:
  securityContext:
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: app
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      runAsNonRoot: true
      runAsUser: 65534
      capabilities:
        drop: ["ALL"]

These four securityContext settings together (allowPrivilegeEscalation: false, readOnlyRootFilesystem: true, runAsNonRoot: true, capabilities.drop: ALL) make container escape significantly harder and satisfy the Kubernetes Restricted pod security standard.

7. API Server Hardening

The API server is the most critical component to harden. Key settings:

  • Disable anonymous authentication. --anonymous-auth=false ensures every request is authenticated.
  • Enable audit logging. Log all API server requests to a file or webhook. Without audit logs, you cannot investigate incidents or detect RBAC abuse.
  • Restrict admission plugins. Ensure NodeRestriction is enabled — it prevents node kubelets from modifying objects outside their own node.
  • Do not expose the API server to the internet. Use a VPN, bastion host, or private endpoint. If you must expose it, restrict access by IP.
# Minimal audit policy - log all requests at metadata level,
# and full request body for sensitive resources
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: RequestResponse
  resources:
  - group: ""
    resources: ["secrets", "configmaps"]
- level: Metadata
  omitStages: ["RequestReceived"]

8. etcd Security

etcd stores all cluster state. Treat it as sensitive as your production database:

  • Enable TLS for all etcd communication. Both peer communication (etcd-to-etcd) and client communication (apiserver-to-etcd) must use mutual TLS.
  • Restrict network access to etcd. etcd should only be reachable by the API server. Use firewall rules or security groups to enforce this.
  • Enable encryption at rest. As described in the Secrets section above.
  • Backup etcd regularly. An etcd snapshot is a complete copy of all cluster state, including all Secrets. Encrypt backups and store them separately from the cluster.

9. CIS Kubernetes Benchmark

The CIS Kubernetes Benchmark is a comprehensive checklist of security controls covering the control plane, nodes, and workloads. Running kube-bench against your cluster gives you a scored assessment against the CIS controls:

kubectl apply -f https://raw.githubusercontent.com/aquasecurity/kube-bench/main/job.yaml
kubectl logs $(kubectl get pods -l app=kube-bench -o name)

kube-bench outputs PASS/FAIL/WARN for each control with remediation guidance. Run it after initial cluster setup and after major configuration changes.

10. Continuous Security Posture with Kubescape

Kubescape and similar tools (Starboard/Trivy Operator, KubeScore) provide continuous security scanning of live cluster state — not just a one-time audit. They check workloads against NSA/CISA hardening guidelines, MITRE ATT&CK framework, and the CIS benchmark in real time.

Deploy Trivy Operator for continuous in-cluster scanning:

helm repo add aquasecurity https://aquasecurity.github.io/helm-charts/
helm install trivy-operator aquasecurity/trivy-operator 
  --namespace trivy-system 
  --create-namespace 
  --set="trivy.ignoreUnfixed=true"

Trivy Operator creates VulnerabilityReport, ConfigAuditReport, and RbacAssessmentReport custom resources in the same namespace as each workload. These can be scraped by Prometheus and displayed in Grafana for a security dashboard.

Security Hardening Checklist

  • ✅ RBAC reviewed — no wildcard roles, no unnecessary cluster-admin bindings
  • ✅ ServiceAccount token automount disabled for workloads that do not need API access
  • ✅ Pod Security Standards enforced at namespace level (at least Baseline, Restricted where possible)
  • ✅ Network policies deployed — default deny with explicit allows
  • ✅ Secrets encrypted at rest in etcd
  • ✅ Images scanned in CI — no critical CVEs in production
  • ✅ Private registry enforced via admission control
  • ✅ Container securityContext hardened (non-root, read-only filesystem, no capabilities)
  • ✅ seccomp RuntimeDefault profile enabled
  • ✅ API server audit logging enabled
  • ✅ etcd TLS and network access restricted
  • ✅ kube-bench run and critical/high findings remediated
  • ✅ Runtime security (Falco) deployed and alerts routed to on-call
  • ✅ Continuous scanning (Trivy Operator or Kubescape) deployed

FAQ

Where do I start if my cluster has no security controls today?

Start with the highest-impact, lowest-effort controls first: audit your RBAC (revoke cluster-admin where not needed), enable Pod Security Admission in warn mode on all namespaces, and deploy Trivy Operator. These three steps give you immediate visibility and prevent the most common privilege escalations without breaking anything.

Does enabling Network Policies break DNS resolution?

Yes, if you deploy a default-deny egress policy without explicitly allowing DNS. Add an egress rule allowing UDP port 53 to the kube-dns service in kube-system when applying default-deny network policies.

Is Kubernetes certified for PCI-DSS or SOC 2?

Kubernetes itself is not certified — your configuration and the controls you implement determine compliance. The CIS Kubernetes Benchmark maps to many PCI-DSS and SOC 2 requirements. Managed Kubernetes offerings (EKS, GKE, AKS) have their own compliance certifications for the underlying infrastructure.

Should I use OPA Gatekeeper or Kyverno?

Both enforce admission policies, but Kyverno is Kubernetes-native (policies are written as YAML) while Gatekeeper uses Rego (a purpose-built policy language). For teams without Rego expertise, Kyverno is significantly faster to adopt and maintain. For teams already using OPA elsewhere in their stack, Gatekeeper offers consistency. Both integrate well with GitOps workflows.

How often should I update Kubernetes for security patches?

Follow a patch release within 30 days of release for CVEs rated High or Critical. Minor version upgrades (e.g., 1.29 → 1.30) should happen within the support window — Kubernetes maintains the last three minor versions. Falling more than one minor version behind means running without security patches for a growing subset of the codebase.

For a deeper look at how security fits into the broader Kubernetes platform architecture, see the Kubernetes architecture patterns guide and the guide on building a security-first Kubernetes culture.

DevSecOps vs DevOps: Key Differences Explained by Answering 3 Core Questions

DevSecOps vs DevOps: Key Differences Explained by Answering 3 Core Questions

DevSecOps is a concept you probably have heard extensively in the last few months. You will see it in alignment with the traditional idea of DevOps. This probably, at some point, makes you wonder about a DevSecOps vs DevOps comparison, even trying to understand what are the main differences between them or if they are the same concept. And also, with other ideas starting to appear, such as Platform Engineering or Site Reliability, it is beginning to create some confusion in the field that I would like to clarify today in this article.

What is DevSecOps?

DevSecOps is an extension of the DevOps concept and methodology. Now, it is not a joint effort between Development and Operation practices but a joint effort among Development, Operation, and Security.

DevSecOps vs DevOps: Key Differences Explained by Answering 3 Core Questions
Diagram by GeekFlare: A DevSecOps Introduction (https://geekflare.com/devsecops-introduction/)

Implies introducing security policies, practices, and tools to ensure that the DevOps cycles provide security along this process. We already commented on including security components to provide a more secure deployment process. We even have specific articles about these tools, such as scanners, docker registries, etc.

Why DevSecOps is important?

DevSecOps, or to be more explicit, including security practices as part of the DevOps process, is critical because we are moving to hybrid and cloud architectures where we incorporate new design, deployment, and development patterns such as containers, microservices, and so on.

This situation makes that we are moving from one side to having hundreds of applications in the most complex cases to thousands of applications, and to have dozens of servers to thousands of containers, each of them with different base images and third-party libraries that can be obsolete, have a security hole or just be raised new vulnerabilities such as we have seen in the past with the Spring Framework or the Log4J library to shout some of the most recent global substantial security issues that the companies dealt with.

So, even the most extensive security team cannot be at pace checking manually or with a set of scripting all the different new challenges to the security if we don’t include them as part of the overall process of the development and deployment of the components. This is where the concept of shift-left security is usually considered, and we already covered that in this article you can read here.

DevSecOps vs DevOps: Is DevSecOps just updated DevOps?

So based on the above definition, you can think: “Ok, so when somebody talks about DevOps as not thinking about security”. This is not true.

In the same aspect, when we talk about DevOps, it is not explicitly all the detailed steps, such as software quality assurance, unit testing, etc. So, as happens with many extensions in this industry, the original, global or generic concept includes the contents of the wings as well.

So, in the end, DevOps and DevSecOps are the same things, especially today when all companies and organizations are moving to the cloud or hybrid environments where security is critical and non-negotiable. Hence, every task that we do, from developing software to access to any service, needs to be done with Security in mind. But I used both concepts in different scenarios. I will use DevSecOps when I would like to explicitly highlight the security aspect because of the audience, the context, or the topic we are discussing to do differentiation.

Still, in any generic context, DevOps will include the security checks will be retained for sure because if it is not, it is just useless. Me.

 Summary

So, in the end, when somebody speaks today about DevOps, it implicitly includes the security aspect, so there is no difference between both concepts. But you will see and also find it helpful to use the specific term DevSecOps when you want to highlight or differentiate this part of the process.

CICD Docker: Top 3 Reasons Why Using Containers In Your DevSecOps pipeline

CICD Docker: Top 3 Reasons Why Using Containers In Your DevSecOps pipeline

Improve the performance and productivity of your DevSecOps pipeline using containers.

CICD Docker means the approach most companies are using to introduce containers also in the building and pre-deployment phase to implement a part of the CICD pipeline. Let’s see why.

DevSecOps is the new normal for deployments at scale in large enterprises to meet the pace required in digital business nowadays. These processes are orchestrated using a CICD orchestration tool that acts as the brain of this process. Usual tools for doing this job are Jenkins, Bamboo, AzureDevOps, GitLab, GitHub.

In the traditional approach, we have different worker servers doing stages of the DevOps process: Code, Build, Test, Deploy, and for each of them, we need different kinds of tools and utilities to do the job. For example, to get the code, we can need a git installed. To do the build, we can rely on maven or Gradle, and to test, we can use SonarQube and so on.

CICD Docker: 3 Reasons to use Containers in your DevSecOps pipeline
CICD Docker Structure and the relationship between Orchestrator and Workers

So, in the end, we need a set of tools to perform successfully, and that also requires some management. In the new days, with the rise of cloud-native development and the container approach in the industry, this is also affecting the way that you develop your pipelines to introduce containers as part of the stage.

In most of the CI Orchestrators, you can define a container image to run as any step of your DevSecOps process, and let me tell you that is great if you do so because this will provide you a lot of the benefits that you need to be aware of.

1.- Much more scalable solution

One of the problems when you use an orchestrator as the main element in your company, and that is being used by a lot of different technologies that can be open-source proprietary, code-based, visual development, and so on that means that you need to manage a lot of things and install the software in the workers.

Usually, what you do is that you define some workers to do the build of some artifacts, like the image shown below:

CICD Docker: Top 3 Reasons Why Using Containers In Your DevSecOps pipeline
Worker distribution based on its own capabilities

That is great because it allows segmentation of the build process and doesn’t require all software installed in all machines, even when they can be non-compatible.

But what happens if we need to deploy a lot of applications of one of the types that we have in the picture below, like TIBCO BusinessWorks applications? That you will be limited based on the number of workers who have the software installed to build it and deploy it.

With a container-based approach, you will have all the workers available because no software is needed, you just need to pull the docker image, and that’s it, so you are only limited by the infrastructure you use, and if you adopt a cloud platform as part of the build process, these limitations are just removed. Your time to market and deployment pace is improved.

2.- Easy to maintain and extend

If you remove the need to install and manage the workers because they are spin up when you need it and delete it when they are not needed and all the thing you need to do is to create a container image that does the job, the time and the effort the teams need to spend in maintaining and extending the solution will drop considerably.

Also the removal of any upgrade process for the components involved on the steps as they follow the usual container image process.

3.- Avoid Orchestrator lock-in

As we rely on the containers to do most of the job, the work that we need to do to move from one DevOps solution to another is small, and that gives us the control to choose at any moment if the solution that we are using is the best one for our use-case and context or we need to move to another more optimized without the problem to justify big investments to do that job.

You get the control back, and you can also even go to a multi-orchestrator approach if needed, like using the best solution for each use-case and getting all the benefits for each of them at the same time without needing to fight against each of them.

Summary

All the benefits that we all know from cloud-native development paradigms and containers are relevant for application development and other processes that we use in our organization, being one of those your DevSecOps pipeline and processes. Start today making that journey to get all those advantages in the building process and not wait until it is too late. Enjoy your day. Enjoy your life.

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.