Kubernetes security is not a single feature you enable — it is a layered discipline that spans the control plane, workloads, networking, supply chain, and runtime. Get one layer wrong and the others rarely save you. This guide covers the controls that matter most in production, why each one exists, and how to implement them without breaking your cluster — plus a prioritized roadmap so you know what to do in your first week, not just an undifferentiated list of “best practices.”
Let me start with the part most hardening guides skip: what an actual attack looks like.
Anatomy of a Real Kubernetes Attack Chain
Abstract advice (“apply least privilege”) doesn’t land until you’ve seen how a single misconfiguration cascades. Here is a realistic chain — every step maps to a documented technique in the MITRE ATT&CK for Containers matrix. If you haven’t seen it before, ATT&CK is an industry-standard, openly maintained knowledge base of real-world adversary behavior: a catalogue of how attackers actually operate, organized by goal (initial access, credential access, lateral movement, and so on). It’s the common language security teams use to describe and defend against attacks.
- Initial access. An application pod runs a vulnerable image — say, an unpatched dependency with a remote code execution (RCE) flaw, a bug that lets an attacker run arbitrary code on the host process. The attacker gets code execution inside the container. So far, container isolation should contain the blast radius.
- Credential access. The pod has
automountServiceAccountToken: true(the default). The attacker reads/var/run/secrets/kubernetes.io/serviceaccount/token— a valid API credential, handed to them for free. - Discovery. Using that token, the attacker queries the API server. The ServiceAccount was bound to a convenient
cluster-adminrole “to unblock a deploy.” Now they can list every Secret in every namespace. - Lateral movement. They read database credentials, cloud provider keys, and other ServiceAccount tokens from Secrets. The flat pod network (no NetworkPolicies) lets them reach internal services directly.
- Privilege escalation / escape. They schedule a privileged pod with
hostPIDand the host filesystem mounted, then break out to the node. From the node, they reach the kubelet and other tenants’ workloads. - Impact. Crypto-mining, data exfiltration, or ransomware across the cluster.
Notice that steps 2 through 5 each had a one-line fix: disable token automount, scope the RBAC, encrypt Secrets / use an external store, apply default-deny NetworkPolicies, enforce Pod Security. Defense in depth means an attacker has to defeat every layer — and most attackers give up when the easy chain breaks. The rest of this guide is those layers, ordered by how much they shrink that chain.
The Kubernetes Attack Surface
Before hardening anything, understand what you are protecting. A Kubernetes cluster has several distinct attack surfaces:
- API server — The central control plane. Any entity that can reach it with valid credentials can read cluster state, modify workloads, or escalate privileges.
- etcd — Stores all cluster state in plain text, including Secrets. Direct etcd access is equivalent to root on every node.
- Nodes — A compromised node can access all Secrets mounted on pods running on it, access the kubelet API, and potentially escape to the hypervisor.
- Pods — Privileged pods, host-network pods, and pods with excessive capabilities can break container isolation.
- Supply chain — Malicious images, compromised registries, and unsigned artifacts can introduce attacker-controlled code into your cluster.
- RBAC — Overly permissive roles allow lateral movement and privilege escalation once an attacker gains any foothold.
Prioritize based on your threat model — a public-facing multi-tenant cluster needs all of these; an internal development cluster can relax some.
The First-Week Hardening Roadmap (Prioritized)
If you inherited a cluster with nothing in place, do not try to do everything at once. Order matters — some controls give huge risk reduction for minimal effort and zero breakage risk, others need careful rollout. This is the sequence I use:
| Day | Control | Risk reduction | Breakage risk |
|---|---|---|---|
| 1 | Audit RBAC, remove stray cluster-admin, disable unused SA token automount | High | Low |
| 1 | Enable API server audit logging | Medium (visibility) | None |
| 2 | Pod Security Admission in warn + audit mode (all namespaces) | High | None (warn only) |
| 3 | Deploy image scanning in CI (Trivy/Grype), fail on Critical | High | Low |
| 4 | NetworkPolicies in audit-style rollout: default-deny in one namespace first | High | Medium — test DNS! |
| 5 | Enable etcd encryption at rest / move Secrets to external store | High | Low |
| 6 | Flip Pod Security Admission to enforce: baseline, then restricted per namespace | High | Medium |
| 7 | Deploy runtime detection (Falco) + continuous scanning (Trivy Operator) | Medium | None |
The single most important idea: roll out enforcing controls in observation mode first (warn/audit for Pod Security, default-deny NetworkPolicies in one namespace). You want to discover what breaks in a dashboard, not in an incident.
Tools to automate and report each step
You don’t have to do any of this by hand. Each step has tooling that both applies the control and reports on its state, so you can wire it into CI or a recurring job:
- Day 1 — RBAC audit: kubectl-who-can, rbac-tool (visualize and lint bindings), and KubiScan to hunt risky roles and tokens.
- Day 1 — API audit logging: native kube-apiserver flags (
--audit-policy-file,--audit-log-path); ship the logs to Loki or your SIEM for alerting. - Day 2 — Pod Security: PSA is built in; Polaris scores workloads against the Restricted standard and produces a dashboard/report.
- Day 3 — Image scanning: Trivy or Grype as a CI gate — see my walkthrough on scanning Docker images with Trivy and scanning images on your local machine.
- Day 4 — NetworkPolicies: Cilium Hubble or Calico flow logs to see the flows before you deny them; netpol-analyzer to check connectivity from manifests.
- Day 5 — Secrets/etcd encryption: native
EncryptionConfiguration; the External Secrets Operator to report drift against an external store. - Day 6–7 — Runtime + posture: Falco for runtime alerts, Trivy Operator and Kubescape for continuous, scheduled scans with exportable reports.
1. RBAC: Least Privilege from Day One
Role-Based Access Control is Kubernetes’ primary authorization mechanism. Most clusters fail at RBAC not because it is misconfigured, but because it is over-permissive by default and nobody reviews it systematically.
Common RBAC Mistakes
- Binding to
cluster-adminfor convenience. Almost no workload needs cluster-admin. Use namespaced roles wherever possible. - Using
*verbs or resources in roles. Wildcard permissions are almost always broader than intended. - Not auditing ServiceAccount token usage. Every pod gets a ServiceAccount. Custom workloads often get over-permissive SAs.
- Forgetting
automountServiceAccountToken: false. If a workload does not need to talk to the Kubernetes API, disable token mounting entirely — this single setting breaks step 2 of the attack chain above.
Practical RBAC Patterns
For a workload that only needs to read ConfigMaps in its own namespace:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: configmap-reader
namespace: my-app
rules:
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: my-app-configmap-reader
namespace: my-app
subjects:
- kind: ServiceAccount
name: my-app
namespace: my-app
roleRef:
kind: Role
name: configmap-reader
apiGroup: rbac.authorization.k8s.ioAnd disable token automount on the workload that doesn’t call the API at all:
apiVersion: v1
kind: ServiceAccount
metadata:
name: my-app
namespace: my-app
automountServiceAccountToken: falseAudit existing RBAC with kubectl-who-can or rbac-tool to find overly permissive bindings before attackers do. A useful one-liner: list every subject that can read Secrets cluster-wide with kubectl who-can get secrets.
2. Pod Security Standards (and Migrating off PodSecurityPolicy)
PodSecurityPolicy was deprecated in Kubernetes 1.21 and removed in 1.25. Its replacement is Pod Security Admission (PSA), a built-in admission controller that enforces one of three Pod Security Standards profiles at the namespace level:
- Privileged — No restrictions. For system components only.
- Baseline — Prevents the most critical privilege escalations: privileged containers, hostPID, hostIPC, hostNetwork, dangerous capabilities.
- Restricted — Enforces current hardening best practices. Requires running as non-root, dropping all capabilities, and using a restricted seccomp profile.
Enable enforcement at the namespace level with labels:
apiVersion: v1
kind: Namespace
metadata:
name: production
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/enforce-version: v1.30
pod-security.kubernetes.io/warn: restricted
pod-security.kubernetes.io/warn-version: v1.30
pod-security.kubernetes.io/audit: restricted
pod-security.kubernetes.io/audit-version: v1.30A pod that runs as root or requests host-network in a namespace enforcing restricted will be rejected at admission. The warn and audit modes let you test before enforcing. For a full walkthrough of how PSA evaluates pods and how to roll it out, see my guide on understanding Pod Security Admission.
Migrating from PodSecurityPolicy to PSA
If you’re still on a cluster that used PSP, the migration path is:
- Map your PSPs to the closest PSA level. Most “restricted” PSPs map to
restricted; permissive ones tobaseline. The officialpspmigratortool can suggest mappings. - Label every namespace in
warn/auditmode matching that level — no enforcement yet. - Watch the audit logs and warnings for a release cycle. Fix the workloads that would be rejected (add
securityContext, drop capabilities). - Flip to
enforcenamespace by namespace, starting with the least critical.
PSA is intentionally coarse-grained — three levels, namespace-scoped. For anything finer (per-team registries, required labels, custom mutation), you need a policy engine, which is the next section.
3. Policy Engines: Kyverno vs OPA Gatekeeper
Once you outgrow PSA’s three levels, you need an admission policy engine. The two standards are Kyverno and OPA Gatekeeper, and choosing between them is one of the most common platform decisions.
| Kyverno | OPA Gatekeeper | |
|---|---|---|
| Policy language | YAML (Kubernetes-native) | Rego (purpose-built DSL) |
| Learning curve | Low — looks like other manifests | Steep — Rego is its own paradigm |
| Mutation support | Yes, first-class | Limited |
| Image verification (Cosign) | Built-in | Via external data |
| Best when | Team wants fast adoption, K8s-only | Team already runs OPA across the stack |
For most teams without existing Rego expertise, Kyverno is significantly faster to adopt and maintain. A Kyverno policy to require all images come from your private registry:
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: restrict-image-registries
spec:
validationFailureAction: Enforce
rules:
- name: validate-registries
match:
any:
- resources:
kinds: ["Pod"]
validate:
message: "Images must come from registry.company.com"
pattern:
spec:
containers:
- image: "registry.company.com/*"Both integrate cleanly with GitOps — store policies in Git, apply via Argo CD or Flux, and you get an auditable history of every policy change. I’ve written several deep dives on this: Kyverno: enforcing standard and custom policies, extending Kyverno with custom rules, and running the Kyverno CLI in CI/CD with GitHub Actions — or browse everything under the policies tag.
4. Network Policies: Micro-Segmentation
By default, every pod in a Kubernetes cluster can communicate with every other pod across all namespaces. This flat network model gives attackers unrestricted lateral movement once they compromise any workload (step 4 of the attack chain).
Network Policies define L3/L4 allow-rules for pod-to-pod communication. They are enforced by your CNI (Container Network Interface) plugin (Calico, Cilium, Weave — not Flannel, which does not support NetworkPolicy).
Default Deny Pattern
Start by denying all ingress and egress in a namespace, then open only what is explicitly needed:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: production
spec:
podSelector: {}
policyTypes:
- Ingress
- EgressThen allow specific traffic:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-api-to-db
namespace: production
spec:
podSelector:
matchLabels:
app: postgres
ingress:
- from:
- podSelector:
matchLabels:
app: api
ports:
- protocol: TCP
port: 5432The DNS Trap (the #1 reason default-deny “breaks everything”)
The most common NetworkPolicy support ticket: “I applied default-deny and the whole namespace stopped working.” The cause is almost always DNS. A default-deny egress policy blocks the pod from reaching kube-dns, so every name resolution fails and applications appear to hang or crash-loop.
Always pair default-deny egress with an explicit DNS allow rule:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-dns
namespace: production
spec:
podSelector: {}
policyTypes:
- Egress
egress:
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53Roll default-deny out in one namespace first, confirm DNS and required egress work, then expand. Tools like Cilium’s Hubble or Calico’s flow logs make it much easier to see exactly which flows you need to allow.
5. Secrets Management
Kubernetes Secrets are base64-encoded, not encrypted. They are stored in etcd in plain text by default. Anyone with get permission on Secrets can read them. This is not a vulnerability — it is a design decision that puts the responsibility on you to:
- Enable encryption at rest for etcd. Configure
EncryptionConfigurationwith an AES-CBC or AES-GCM provider so Secrets are encrypted before being written to etcd. - Use external secret stores. HashiCorp Vault, AWS Secrets Manager, or Azure Key Vault with the External Secrets Operator means actual secret values never live in Kubernetes at all.
- Restrict Secret RBAC aggressively. Never give
liston Secrets cluster-wide — it returns all values. Usegeton named resources where possible. - Avoid environment variables for secrets. Prefer volume mounts. Env vars are visible in pod inspect output and can leak through application logging.
# etcd encryption at rest - in kube-apiserver config
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
- resources:
- secrets
providers:
- aescbc:
keys:
- name: key1
secret: <base64-encoded-32-byte-key>
- identity: {}For the full external-store pattern, see the guide on injecting secrets into pods with HashiCorp Vault.
6. Image Security and Supply Chain
Your runtime security posture is only as good as the images you run. A compromised image from a public registry bypasses every runtime control you have.
Scan images in CI
Use Trivy, Grype, or Snyk to scan images as part of CI. Block deployments of images with critical CVEs (Common Vulnerabilities and Exposures — publicly catalogued security flaws). I’ve covered the practical side of this in scanning Docker images with Trivy, scanning your images locally before they ship, and a broader roundup of open-source development security tools.
# In your CI pipeline
trivy image --exit-code 1 --severity CRITICAL your-image:tagUse a private registry with admission control
Only allow images from your private registry using an admission webhook (Kyverno, OPA Gatekeeper) — the policy in section 3 does exactly this. It prevents developers from running arbitrary public images in production.
Use distroless or minimal base images
Distroless images contain only the application and its runtime dependencies — no shell, no package manager, no debugging tools. This drastically reduces both the attack surface and the CVE count. Google’s distroless images are available for Java, Node.js, Python, and Go. (Related: debugging distroless containers when you do need to inspect one.)
Sign and verify images (and the SLSA angle)
Cosign (from the Sigstore project) lets you sign container images and verify signatures at admission time using Kyverno or Connaisseur. This prevents image-substitution attacks where an attacker replaces a legitimate image in your registry.
If you’re being asked about supply-chain compliance, the framework to know is SLSA (Supply-chain Levels for Software Artifacts). The practical progression: SLSA L1 = you have a build provenance document; L2 = it’s signed and the build is hosted; L3 = the build is hardened and non-falsifiable. Generating provenance with your CI (GitHub Actions has native SLSA generators) and verifying it at admission with Cosign + Kyverno gets you most of the way to L2/L3 without a platform rebuild.
7. Runtime Security
Runtime security detects and responds to malicious activity after a container is running. The primary tool is Falco — a CNCF project that uses eBPF (extended Berkeley Packet Filter — a Linux kernel technology for running sandboxed observability programs) to monitor system calls and raise alerts when containers behave unexpectedly.
Default Falco rules catch common attack patterns:
- Shell spawned in a container
- Network connection to an unexpected IP
- Write to a sensitive file path (
/etc/passwd,/etc/shadow) - Privilege escalation via setuid binaries
- Container drift (new executable files written at runtime)
Combine Falco with seccomp profiles to restrict the system calls a container can make at the kernel level. The RuntimeDefault seccomp profile (a default option since Kubernetes 1.27) blocks 300+ system calls that containers virtually never need.
spec:
securityContext:
seccompProfile:
type: RuntimeDefault
containers:
- name: app
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 65534
capabilities:
drop: ["ALL"]These four securityContext settings together (allowPrivilegeEscalation: false, readOnlyRootFilesystem: true, runAsNonRoot: true, capabilities.drop: ALL) make container escape significantly harder and satisfy the Kubernetes Restricted pod security standard. They directly close step 5 of the attack chain.
8. API Server Hardening
The API server is the most critical component to harden. Key settings:
- Disable anonymous authentication.
--anonymous-auth=falseensures every request is authenticated. - Enable audit logging. Log all API server requests to a file or webhook. Without audit logs, you cannot investigate incidents or detect RBAC abuse.
- Restrict admission plugins. Ensure
NodeRestrictionis enabled — it prevents node kubelets from modifying objects outside their own node. - Do not expose the API server to the internet. Use a VPN, bastion host, or private endpoint. If you must expose it, restrict access by IP.
# Minimal audit policy - log all requests at metadata level,
# and full request body for sensitive resources
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: RequestResponse
resources:
- group: ""
resources: ["secrets", "configmaps"]
- level: Metadata
omitStages: ["RequestReceived"]9. etcd Security
etcd stores all cluster state. Treat it as sensitive as your production database:
- Enable TLS for all etcd communication — both peer (etcd-to-etcd) and client (apiserver-to-etcd) with mutual TLS.
- Restrict network access to etcd. It should only be reachable by the API server. Use firewall rules or security groups.
- Enable encryption at rest (see Secrets section).
- Back up etcd regularly. A snapshot is a complete copy of all cluster state, including all Secrets. Encrypt backups and store them separately from the cluster.
10. Multi-Tenancy Isolation
If multiple teams or customers share a cluster, namespace boundaries alone are not a security boundary — they’re an organizational one. Hardening multi-tenant clusters adds requirements on top of everything above:
- Namespace-per-tenant with ResourceQuotas and LimitRanges to prevent noisy-neighbor and resource-exhaustion DoS.
- NetworkPolicies that deny cross-namespace traffic by default, so tenant A cannot reach tenant B’s pods.
- A policy engine enforcing per-tenant rules (allowed registries, required labels, no
hostPath). - Separate node pools for untrusted workloads, or a sandboxed runtime (gVisor, Kata Containers) when you run genuinely untrusted code.
For hard multi-tenancy (untrusted tenants), the honest answer is that vanilla namespaces aren’t enough — consider virtual clusters (vCluster) or separate clusters entirely. Soft multi-tenancy (trusted internal teams) is well served by the controls in this guide.
11. Benchmarks and Continuous Posture
CIS Kubernetes Benchmark
The CIS Kubernetes Benchmark is a comprehensive checklist covering the control plane, nodes, and workloads. Running kube-bench gives you a scored assessment:
kubectl apply -f https://raw.githubusercontent.com/aquasecurity/kube-bench/main/job.yaml
kubectl logs $(kubectl get pods -l app=kube-bench -o name)kube-bench outputs PASS/FAIL/WARN for each control with remediation guidance. Run it after initial cluster setup and after major configuration changes.
Continuous scanning with Trivy Operator / Kubescape
Kubescape and the Trivy Operator provide continuous security scanning of live cluster state — not just a one-time audit. They check workloads against NSA/CISA hardening guidelines, the MITRE ATT&CK framework, and the CIS benchmark in real time.
helm repo add aquasecurity https://aquasecurity.github.io/helm-charts/
helm install trivy-operator aquasecurity/trivy-operator \
--namespace trivy-system \
--create-namespace \
--set="trivy.ignoreUnfixed=true"Trivy Operator creates VulnerabilityReport, ConfigAuditReport, and RbacAssessmentReport custom resources alongside each workload. Scrape them with Prometheus and build a security dashboard in Grafana.
Security Hardening Checklist
- ✅ RBAC reviewed — no wildcard roles, no unnecessary cluster-admin bindings
- ✅ ServiceAccount token automount disabled for workloads that do not need API access
- ✅ Pod Security Standards enforced at namespace level (at least Baseline, Restricted where possible)
- ✅ Policy engine (Kyverno/Gatekeeper) enforcing registry, label, and mutation rules
- ✅ Network policies deployed — default deny with explicit allows (including DNS!)
- ✅ Secrets encrypted at rest in etcd or moved to an external store
- ✅ Images scanned in CI — no critical CVEs in production
- ✅ Private registry enforced via admission control
- ✅ Image signing + verification (Cosign) and build provenance (SLSA)
- ✅ Container securityContext hardened (non-root, read-only fs, no capabilities)
- ✅ seccomp RuntimeDefault profile enabled
- ✅ API server audit logging enabled, anonymous auth disabled
- ✅ etcd TLS and network access restricted
- ✅ Multi-tenancy isolation (quotas, cross-namespace deny) if shared
- ✅ kube-bench run and critical/high findings remediated
- ✅ Runtime security (Falco) deployed and alerts routed to on-call
- ✅ Continuous scanning (Trivy Operator or Kubescape) deployed
FAQ
Where do I start if my cluster has no security controls today?
Follow the first-week roadmap above. The short version: audit RBAC (revoke stray cluster-admin), enable Pod Security Admission in warn mode on all namespaces, and deploy image scanning + Trivy Operator. These give immediate visibility and stop the most common privilege escalations without breaking anything.
Does enabling Network Policies break DNS resolution?
Yes — this is the single most common failure. A default-deny egress policy blocks pods from reaching kube-dns, so name resolution fails. Add an egress rule allowing UDP and TCP port 53 to the kube-system namespace whenever you apply default-deny (see the DNS allow policy above).
Should I use OPA Gatekeeper or Kyverno?
Both enforce admission policies. Kyverno is Kubernetes-native (policies are YAML) while Gatekeeper uses Rego. For teams without Rego expertise, Kyverno is faster to adopt and supports mutation and Cosign verification out of the box. Choose Gatekeeper if you already run OPA elsewhere and want one policy language across your stack.
What replaced PodSecurityPolicy?
Pod Security Admission (PSA), built into Kubernetes since 1.25. It enforces three profiles (privileged/baseline/restricted) via namespace labels. For finer-grained control than PSA’s three levels, add Kyverno or Gatekeeper.
Is Kubernetes certified for PCI-DSS or SOC 2?
Kubernetes itself is not certified — your configuration and the controls you implement determine compliance. The CIS Kubernetes Benchmark maps to many PCI-DSS and SOC 2 requirements. Managed offerings (EKS, GKE, AKS) carry their own compliance certifications for the underlying infrastructure.
How often should I update Kubernetes for security patches?
Apply a patch release within 30 days for High/Critical CVEs. Minor version upgrades (e.g., 1.30 → 1.31) should happen within the support window — Kubernetes maintains the last three minor versions. Falling more than one minor behind means running without patches for a growing subset of the codebase.
Are namespaces a security boundary?
No. Namespaces are an organizational boundary. Real isolation between tenants requires NetworkPolicies, ResourceQuotas, a policy engine, and — for untrusted workloads — sandboxed runtimes (gVisor/Kata) or separate clusters.
For a deeper look at how security fits into the broader Kubernetes platform architecture, see the Kubernetes architecture patterns guide and the guide on building a security-first Kubernetes culture.