Debug Archives - Alexandre Vazquez

The container works fine in CI. It deploys successfully to staging. Then something goes wrong in production and you type the command you always type: kubectl exec -it my-pod -- /bin/bash. The response is immediate: OCI runtime exec failed: exec failed: unable to start container process: exec: "/bin/bash": stat /bin/bash: no such file or directory.

You try /bin/sh. Same error. You try ls. Same error. The container image is distroless — it ships only your application binary and its runtime dependencies, with no shell, no package manager, no debugging tools of any kind. This is intentional and correct from a security standpoint. It is also a significant operational challenge the first time you face it in production.

This article covers every practical technique for debugging distroless containers in Kubernetes: kubectl debug with ephemeral containers (the standard approach), pod copy strategy (for Kubernetes versions without ephemeral container support, or when you need to modify the running pod spec), debug image variants (the pragmatic developer shortcut), cdebug (a purpose-built tool that simplifies the process), and node-level debugging (the last resort with the most power). For each technique I will explain what it can and cannot do, what Kubernetes version or RBAC permissions it requires, and in which scenario — developer in local, platform engineer in staging, ops in production — it is the appropriate choice.

Why Distroless Breaks the Normal Debugging Workflow

Traditional container debugging assumes you can exec into the container and use shell tools: ps, netstat, strace, curl, a text editor. Distroless images remove all of this by design. The Google distroless project, Chainguard’s Wolfi-based images, and the broader minimal image ecosystem deliberately exclude everything that is not required to run the application. The result is a dramatically smaller attack surface: no shell means no RCE via shell injection, no package manager means no easy escalation path, fewer binaries means fewer CVEs in the image scan.

The tradeoff is operational: when something goes wrong, you cannot use the tools that the process itself is not allowed to run. A Java application in gcr.io/distroless/java17-debian12 has the JRE and nothing else. A Go binary compiled with CGO disabled and shipped in gcr.io/distroless/static-debian12 has literally only the binary and the necessary CA certificates and timezone data. There is no wget to download a debug binary, no apt to install one, no bash to run a script.

Kubernetes solves this at the platform level with ephemeral containers, added as stable in Kubernetes 1.25. The principle is that a debug container — which can have a full shell and any tools you want — can be injected into a running pod and share its process namespace, network namespace, and filesystem mounts without modifying the original container or restarting the pod.

Option 1: kubectl debug with Ephemeral Containers

Ephemeral containers are the canonical solution. Since Kubernetes 1.25 (stable), kubectl debug can inject a temporary container into a running pod. The container shares the target pod’s network namespace by default, and with --target it can also share the process namespace of a specific container, allowing you to inspect its running processes and open file descriptors.

The basic invocation is:

kubectl debug -it my-pod \
  --image=busybox:latest \
  --target=my-container

The --target flag is the critical piece. Without it, the ephemeral container gets its own process namespace. With it, it shares the process namespace of the specified container — meaning you can run ps aux and see the application’s processes, use ls -la /proc/<pid>/fd to inspect open file descriptors, and read the application’s environment via cat /proc/<pid>/environ.

For a more capable debug environment, replace busybox with a richer image:

kubectl debug -it my-pod \
  --image=nicolaka/netshoot \
  --target=my-container

nicolaka/netshoot includes tcpdump, curl, dig, nmap, ss, iperf3, and dozens of other network diagnostic tools, making it the standard choice for network debugging scenarios.

What You Can and Cannot Do

Ephemeral containers share the pod’s network namespace and, when --target is used, the process namespace. This gives you:

Full visibility into the application’s network traffic from inside the pod (tcpdump, ss, netstat)
Process inspection via /proc/<pid> — open files, memory maps, environment variables, CPU/memory usage
Access to the pod’s DNS resolution context — exactly the same /etc/resolv.conf the application sees
Ability to make outbound network calls from the same network namespace (testing service endpoints, DNS resolution)

What you do not get with ephemeral containers:

Access to the application container’s filesystem. The ephemeral container has its own root filesystem. You cannot cat /app/config.yaml from the application container’s filesystem unless you access it via /proc/<pid>/root/.
Ability to remove the container once added. Ephemeral containers are permanent until the pod is deleted. This is by design — the Kubernetes API does not allow removing them after creation.
Volume mount modifications via CLI. You cannot add volume mounts to an ephemeral container via kubectl debug (though the API spec supports it, the CLI does not expose this).
Resource limits. Ephemeral containers do not support resource requests and limits in the kubectl debug CLI, though this is evolving.

Accessing the Application Filesystem

The most common surprise for developers new to ephemeral containers is that they cannot directly browse the application container’s filesystem. The workaround is the /proc filesystem:

# Find the application's PID
ps aux

# Browse its filesystem via /proc
ls /proc/1/root/app/
cat /proc/1/root/etc/config.yaml

# Or set the root to the application's root
chroot /proc/1/root /bin/sh  # only if /bin/sh exists in the app image

The /proc/<pid>/root path is a symlink to the container’s root filesystem as seen from the process namespace. Because the ephemeral container shares the process namespace with --target, the application’s PID is typically 1, and /proc/1/root gives you full read access to its filesystem.

RBAC Requirements

Ephemeral containers require the pods/ephemeralcontainers subresource permission. This is separate from pods/exec, which controls kubectl exec. A common mistake is to grant pods/exec for debugging purposes without realizing that ephemeral containers require an additional grant:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: ephemeral-debugger
rules:
- apiGroups: [""]
  resources: ["pods/ephemeralcontainers"]
  verbs: ["update", "patch"]
- apiGroups: [""]
  resources: ["pods/attach"]
  verbs: ["create", "get"]
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list"]

In production environments, this permission should be tightly scoped: time-limited via RoleBinding rather than permanent ClusterRoleBinding, restricted to specific namespaces, and ideally gated behind an approval workflow. The debug container runs as root by default, which can create privilege escalation paths if the application container runs as a non-root user with shared process namespace — the debug container can attach to the application’s processes with higher privileges.

Option 2: kubectl debug –copy-to (Pod Copy Strategy)

When you need to modify the pod’s container spec — replace the image, change environment variables, add a sidecar with a shared filesystem — the --copy-to flag creates a full copy of the pod with your modifications applied:

kubectl debug my-pod \
  -it \
  --copy-to=my-pod-debug \
  --image=my-app:debug \
  --share-processes

This creates a new pod named my-pod-debug that is a copy of my-pod but with the container image replaced by my-app:debug. If my-app:debug is your application image built with debug tooling included (or a debug variant from your registry), this lets you interact with the exact same binary in the exact same configuration as the original pod.

A more common use of --copy-to is to attach a debug container alongside the existing application container while keeping the original image unchanged:

kubectl debug my-pod \
  -it \
  --copy-to=my-pod-debug \
  --image=busybox \
  --share-processes \
  --container=debugger

This creates the copy-pod with both the original containers and a new debugger container sharing the process namespace. Unlike ephemeral containers, this approach supports volume mounts and resource limits, and the debug pod can be deleted cleanly when you are done.

Limitations of the Copy Strategy

The pod copy approach has a critical limitation: it is not debugging the original pod. It creates a new pod that may behave differently because:

It does not share the original pod’s in-memory state — if the issue is a goroutine leak or heap corruption that has been accumulating for hours, the fresh copy will not exhibit it immediately
It creates a new Pod UID, which means any admission webhooks, network policies, or pod-level security contexts that depend on pod identity may apply differently
If the original pod is crashing (CrashLoopBackOff), the copy will also crash — this technique does not help for crash debugging unless you also change the entrypoint

For crash debugging specifically, combine --copy-to with a modified entrypoint to keep the container alive:

kubectl debug my-crashing-pod \
  -it \
  --copy-to=my-pod-debug \
  --image=busybox \
  --share-processes \
  -- sleep 3600

Option 3: Debug Image Variants

The most pragmatic approach — and the one most appropriate for developer workflows — is to maintain a debug variant of your application image that includes shell tooling. Both the Google distroless project and Chainguard provide this pattern officially.

Google distroless images have a :debug tag that adds BusyBox to the image:

# Production image
FROM gcr.io/distroless/java17-debian12

# Debug variant — identical but with BusyBox shell
FROM gcr.io/distroless/java17-debian12:debug

Chainguard images follow a similar convention with :latest-dev variants that include apk, a shell, and common utilities:

# Production (zero shell, minimal footprint)
FROM cgr.dev/chainguard/go:latest

# Development/debug variant
FROM cgr.dev/chainguard/go:latest-dev

If you build your own base images, the recommended approach is to use multi-stage builds and maintain separate build targets:

FROM golang:1.22 AS builder
WORKDIR /app
COPY . .
RUN go build -o myapp .

# Production: static distroless image
FROM gcr.io/distroless/static-debian12 AS production
COPY --from=builder /app/myapp /myapp
ENTRYPOINT ["/myapp"]

# Debug variant: same binary, with shell tools
FROM gcr.io/distroless/static-debian12:debug AS debug
COPY --from=builder /app/myapp /myapp
ENTRYPOINT ["/myapp"]

In your CI/CD pipeline, build both targets and push my-app:${VERSION} (production) and my-app:${VERSION}-debug (debug variant) to your registry. The debug image is never deployed to production by default, but it exists and is ready to be used with kubectl debug --copy-to when needed.

Security Considerations for Debug Variants

Debug image variants defeat much of the security benefit of distroless if they are used in production, even temporarily. Track usage carefully: log when debug images are deployed, require explicit approval, and ensure they are removed after the debugging session. In regulated environments, consider whether deploying a debug variant to production namespaces is permitted by your security policy — in many cases it is not, and you must use ephemeral containers (which add a debug process to the pod without modifying the application image) instead.

Option 4: cdebug

cdebug is an open-source CLI tool that simplifies distroless debugging by wrapping kubectl debug with more ergonomic defaults and additional capabilities. Its primary value is in making ephemeral container debugging feel like a native shell experience:

# Install
brew install cdebug
# or: go install github.com/iximiuz/cdebug@latest

# Debug a running pod
cdebug exec -it my-pod

# Specify a namespace and container
cdebug exec -it -n production my-pod -c my-container

# Use a specific debug image
cdebug exec -it my-pod --image=nicolaka/netshoot

What cdebug adds over raw kubectl debug:

Automatic filesystem chroot. cdebug exec automatically sets the filesystem root of the debug container to the target container’s filesystem, so you browse / and see the application’s files — not the debug image’s files. This addresses the most common friction point with kubectl debug.
Docker integration. cdebug exec works identically for Docker containers (cdebug exec -it <container-id>), making it the same muscle memory for local and cluster debugging.
No RBAC complications for Docker-based local development — useful for developer workflows before the code reaches Kubernetes.

The tradeoff: cdebug is a third-party dependency and requires installation. In environments with strict tooling policies (regulated industries, air-gapped clusters), it may not be an option. In those cases, the raw kubectl debug workflow with /proc/1/root filesystem navigation is the baseline.

Option 5: Node-Level Debugging

When everything else fails — the pod is in CrashLoopBackOff too fast to attach to, the issue is a kernel-level problem, or you need tools like strace that require elevated privileges — node-level debugging gives you direct access to the container’s processes from the host node.

kubectl debug node/ creates a privileged pod on the target node that mounts the node’s root filesystem under /host:

kubectl debug node/my-node-name \
  -it \
  --image=nicolaka/netshoot

From this privileged pod, you can use nsenter to enter the namespaces of any container running on the node:

# Find the container's PID on the node
# (from within the node debug pod)
crictl ps | grep my-container
crictl inspect <container-id> | grep pid

# Enter the container's namespaces
nsenter -t <pid> -m -u -i -n -p -- /bin/sh

# Or just the network namespace (for network debugging)
nsenter -t <pid> -n -- ip a

The nsenter approach lets you run tools from the node’s or debug container’s toolset while operating in the namespaces of the target container. This is how you run strace against a distroless process: strace is not in the application container, but you can run it from the node level while targeting the application’s PID.

# Trace all syscalls from the application process
nsenter -t <pid> -- strace -p <pid> -f -e trace=network

RBAC and Security for Node Debugging

Node-level debugging requires nodes/proxy and the ability to create privileged pods, which in most production clusters is restricted to cluster administrators. The debug pod runs with hostPID: true and hostNetwork: true, giving it visibility into all processes and network traffic on the node — not just the target container. This is significant: every process running on the node, including those in other tenants’ namespaces, is visible.

This technique should be treated as a break-glass procedure: log the access, require dual approval in production environments, and clean up immediately after the debugging session with kubectl delete pod --selector=app=node-debugger.

Choosing the Right Approach: Access Profile and Environment Matrix

The technique you should use depends on two axes: who you are (developer, platform engineer, ops/SRE) and where the issue is (local development, staging, production). The requirements and constraints differ significantly across these combinations.

Developer — Local or Development Cluster

Goal: Reproduce and understand a bug, inspect configuration, verify network connectivity to services.
Constraints: None material — full cluster admin on local or personal dev namespace.
Recommended approach: Debug image variants or cdebug.

In local development (Minikube, Kind, Docker Desktop), the fastest path is to build the debug variant of your image and deploy it directly. If you are working with another team’s service, cdebug exec gives you a shell in the container with automatic filesystem root without any special RBAC. The goal is speed and iteration — reserve the more structured approaches for higher environments.

Developer — Staging Cluster

Goal: Debug integration issues, inspect live configuration, verify environment-specific behavior.
Constraints: Shared cluster — cannot deploy arbitrary workloads to other teams’ namespaces, but has pods/ephemeralcontainers in own namespace.
Recommended approach: kubectl debug with ephemeral containers (--target), scoped to own namespace.

Staging is where ephemeral containers earn their keep. You can attach to a running pod without restarting it, without modifying the deployment spec, and without affecting other users of the same cluster. Grant developers pods/ephemeralcontainers in their team’s namespaces and they can self-service debug without needing ops involvement.

Platform Engineer / SRE — Production

Goal: Diagnose a live production incident. The pod is behaving unexpectedly — high latency, memory growth, unexpected connections, incorrect responses.
Constraints: Changes to running pods are high-risk. Any debug image deployment must be gated. The issue is live and affecting users.
Recommended approach: kubectl debug with ephemeral containers (ephemeral containers do not restart the pod, do not modify the deployment, and are auditable via API audit logs).

The key production requirements are auditability and minimal blast radius. Ephemeral containers satisfy both: they are recorded in the Kubernetes API audit log (who attached, when, to which pod), they do not modify the running application container, and they are limited to the pod’s own network and process namespaces. Document the debug session in your incident ticket: pod name, time, what was observed, who ran the debug container.

The --copy-to strategy is generally inappropriate for production incident response: it creates a new pod that may or may not exhibit the issue, it adds load to the cluster during an incident, and if it is attached to the same services (databases, downstream APIs), it produces additional traffic that complicates forensics.

Platform Engineer — Production, Node-Level Issue

Goal: Diagnose a kernel-level issue, a container runtime problem, a networking issue that spans multiple pods, or a situation where the pod is crashing too fast to attach to.
Constraints: Maximum privilege required. High operational risk.
Recommended approach: Node-level debug pod with nsenter. Treat as break-glass.

For this scenario, create a dedicated RBAC role that grants nodes/proxy access and the ability to create pods with hostPID: true in a dedicated debug namespace. Bind it only to specific users, require a separate authentication step (e.g., kubectl auth can-i check against a time-limited binding), and log all access. This level of access should generate a PagerDuty-style alert so that the security team knows a privileged debug session is active in production.

Common Errors and Solutions

Error: “ephemeral containers are disabled for this cluster”

Ephemeral containers require Kubernetes 1.16+ (alpha, behind feature gate) and are stable from 1.25. If you are on 1.16–1.22, you need to enable the EphemeralContainers feature gate on the API server and kubelet. From 1.23 it was beta and enabled by default. From 1.25 it is stable and always on. On managed Kubernetes services (EKS, GKE, AKS), check the cluster version — versions older than 1.25 may still have it disabled depending on your configuration.

Error: “cannot update ephemeralcontainers” (RBAC)

You have pods/exec but not pods/ephemeralcontainers. Add the grant shown in the RBAC section above. Note that pods/exec and pods/ephemeralcontainers are separate subresources — having one does not imply the other.

Error: “container not found” with –target

The container name in --target must match exactly the container name as defined in the Pod spec — not the image name. Check with kubectl get pod my-pod -o jsonpath='{.spec.containers[*].name}' to get the exact container names.

Error: Can see processes but cannot read /proc/1/root

The application container runs as a non-root user (e.g., UID 1000) and the ephemeral container runs as root. The application’s filesystem may have files owned by UID 1000 that are not readable by other UIDs depending on permissions. The /proc/<pid>/root path itself requires CAP_SYS_PTRACE capability. If your cluster’s PodSecurityStandards (PSS) are set to restricted, the debug container may not have this capability. Use the Baseline PSS profile for debug namespaces or explicitly add SYS_PTRACE to the ephemeral container’s securityContext.

Error: tcpdump shows no traffic

When using nicolaka/netshoot for network debugging, ensure the ephemeral container is created without --target if your goal is to capture all traffic on the pod’s network interface (not just the specific container’s process). With --target, you share the process namespace but the network namespace is shared at the pod level regardless. Run tcpdump -i any to capture on all interfaces including loopback, which is where inter-container traffic within a pod travels.

Decision Framework

Use this as a starting point to select the right technique for your situation:

Scenario	Technique	Requirement
Active production incident, pod running	kubectl debug + ephemeral container	pods/ephemeralcontainers RBAC, k8s 1.25+
Pod crashing too fast to attach	kubectl debug –copy-to + modified entrypoint	Ability to create pods in namespace
Developer debugging in dev/staging	cdebug exec or kubectl debug	pods/ephemeralcontainers or pod create
Need full filesystem access	kubectl debug –copy-to + debug image variant	Debug image in registry, pod create
Need strace or kernel tracing	Node-level debug with nsenter	nodes/proxy, cluster admin equivalent
Network packet capture	kubectl debug + nicolaka/netshoot	pods/ephemeralcontainers
Local Docker debugging	cdebug exec <container-id>	Docker socket access
CI-reproducible debug environment	Debug image variant in separate build target	Separate image tag in registry

Production RBAC Design

A clean RBAC design for production distroless debugging separates three roles with different privilege levels:

# Tier 1: Developer self-service in team namespaces
# Allows attaching ephemeral containers, no node access
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: distroless-debugger
  namespace: team-namespace
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list"]
- apiGroups: [""]
  resources: ["pods/ephemeralcontainers"]
  verbs: ["update", "patch"]
- apiGroups: [""]
  resources: ["pods/attach"]
  verbs: ["create", "get"]
---
# Tier 2: SRE production incident access
# Ephemeral containers across all namespaces
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: sre-distroless-debugger
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list"]
- apiGroups: [""]
  resources: ["pods/ephemeralcontainers"]
  verbs: ["update", "patch"]
- apiGroups: [""]
  resources: ["pods/attach"]
  verbs: ["create", "get"]
---
# Tier 3: Break-glass node access
# Only for platform team, time-limited binding recommended
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: node-debugger
rules:
- apiGroups: [""]
  resources: ["nodes/proxy"]
  verbs: ["get"]
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["create", "get", "list", "delete"]
  # Restrict to debug namespace via RoleBinding, not ClusterRoleBinding

Bind Tier 1 permanently to your developers. Bind Tier 2 to SREs permanently but with audit alerts on use. Bind Tier 3 only on-demand (via a Kubernetes operator that creates time-limited RoleBindings) and never as a permanent ClusterRoleBinding.

Summary

Distroless containers are the correct choice for production workloads. They reduce attack surface, eliminate unnecessary CVEs, and force a cleaner separation between application and tooling. The operational cost is that your traditional debugging workflow — exec into the container, run some commands — no longer works by default.

Kubernetes provides a clean answer with ephemeral containers and kubectl debug: inject a debug container with whatever tools you need into the running pod, sharing its network and process namespaces, without restarting or modifying the application. For scenarios where ephemeral containers are insufficient — filesystem access, crash debugging, kernel-level investigation — the copy strategy and node-level debug fill the remaining gaps.

The key to making this work at scale is not the technique itself but the access model: developers get self-service ephemeral container access in their own namespaces, SREs get cluster-wide ephemeral container access for production incidents, and node-level access is a break-glass procedure with audit trail and time limits. With that model in place, distroless becomes an operational non-issue rather than an obstacle.

Debugging Distroless Containers: kubectl debug, Ephemeral Containers, and When to Use Each