Debugging Distroless Containers: kubectl debug, Ephemeral Containers, and When to Use Each

Developer inspecting a distroless container with magnifying glass

The container works fine in CI. It deploys successfully to staging. Then something goes wrong in production and you type the command you always type: kubectl exec -it my-pod -- /bin/bash. The response is immediate: OCI runtime exec failed: exec failed: unable to start container process: exec: "/bin/bash": stat /bin/bash: no such file or directory.

You try /bin/sh. Same error. You try ls. Same error. The container image is distroless — it ships only your application binary and its runtime dependencies, with no shell, no package manager, no debugging tools of any kind. This is intentional and correct from a security standpoint. It is also a significant operational challenge the first time you face it in production.

This article covers every practical technique for debugging distroless containers in Kubernetes: kubectl debug with ephemeral containers (the standard approach), pod copy strategy (for Kubernetes versions without ephemeral container support, or when you need to modify the running pod spec), debug image variants (the pragmatic developer shortcut), cdebug (a purpose-built tool that simplifies the process), and node-level debugging (the last resort with the most power). For each technique I will explain what it can and cannot do, what Kubernetes version or RBAC permissions it requires, and in which scenario — developer in local, platform engineer in staging, ops in production — it is the appropriate choice.

Why Distroless Breaks the Normal Debugging Workflow

Traditional container debugging assumes you can exec into the container and use shell tools: ps, netstat, strace, curl, a text editor. Distroless images remove all of this by design. The Google distroless project, Chainguard’s Wolfi-based images, and the broader minimal image ecosystem deliberately exclude everything that is not required to run the application. The result is a dramatically smaller attack surface: no shell means no RCE via shell injection, no package manager means no easy escalation path, fewer binaries means fewer CVEs in the image scan.

The tradeoff is operational: when something goes wrong, you cannot use the tools that the process itself is not allowed to run. A Java application in gcr.io/distroless/java17-debian12 has the JRE and nothing else. A Go binary compiled with CGO disabled and shipped in gcr.io/distroless/static-debian12 has literally only the binary and the necessary CA certificates and timezone data. There is no wget to download a debug binary, no apt to install one, no bash to run a script.

Kubernetes solves this at the platform level with ephemeral containers, added as stable in Kubernetes 1.25. The principle is that a debug container — which can have a full shell and any tools you want — can be injected into a running pod and share its process namespace, network namespace, and filesystem mounts without modifying the original container or restarting the pod.

Option 1: kubectl debug with Ephemeral Containers

Ephemeral containers are the canonical solution. Since Kubernetes 1.25 (stable), kubectl debug can inject a temporary container into a running pod. The container shares the target pod’s network namespace by default, and with --target it can also share the process namespace of a specific container, allowing you to inspect its running processes and open file descriptors.

The basic invocation is:

kubectl debug -it my-pod \
  --image=busybox:latest \
  --target=my-container

The --target flag is the critical piece. Without it, the ephemeral container gets its own process namespace. With it, it shares the process namespace of the specified container — meaning you can run ps aux and see the application’s processes, use ls -la /proc/<pid>/fd to inspect open file descriptors, and read the application’s environment via cat /proc/<pid>/environ.

For a more capable debug environment, replace busybox with a richer image:

kubectl debug -it my-pod \
  --image=nicolaka/netshoot \
  --target=my-container

nicolaka/netshoot includes tcpdump, curl, dig, nmap, ss, iperf3, and dozens of other network diagnostic tools, making it the standard choice for network debugging scenarios.

What You Can and Cannot Do

Ephemeral containers share the pod’s network namespace and, when --target is used, the process namespace. This gives you:

  • Full visibility into the application’s network traffic from inside the pod (tcpdump, ss, netstat)
  • Process inspection via /proc/<pid> — open files, memory maps, environment variables, CPU/memory usage
  • Access to the pod’s DNS resolution context — exactly the same /etc/resolv.conf the application sees
  • Ability to make outbound network calls from the same network namespace (testing service endpoints, DNS resolution)

What you do not get with ephemeral containers:

  • Access to the application container’s filesystem. The ephemeral container has its own root filesystem. You cannot cat /app/config.yaml from the application container’s filesystem unless you access it via /proc/<pid>/root/.
  • Ability to remove the container once added. Ephemeral containers are permanent until the pod is deleted. This is by design — the Kubernetes API does not allow removing them after creation.
  • Volume mount modifications via CLI. You cannot add volume mounts to an ephemeral container via kubectl debug (though the API spec supports it, the CLI does not expose this).
  • Resource limits. Ephemeral containers do not support resource requests and limits in the kubectl debug CLI, though this is evolving.

Accessing the Application Filesystem

The most common surprise for developers new to ephemeral containers is that they cannot directly browse the application container’s filesystem. The workaround is the /proc filesystem:

# Find the application's PID
ps aux

# Browse its filesystem via /proc
ls /proc/1/root/app/
cat /proc/1/root/etc/config.yaml

# Or set the root to the application's root
chroot /proc/1/root /bin/sh  # only if /bin/sh exists in the app image

The /proc/<pid>/root path is a symlink to the container’s root filesystem as seen from the process namespace. Because the ephemeral container shares the process namespace with --target, the application’s PID is typically 1, and /proc/1/root gives you full read access to its filesystem.

RBAC Requirements

Ephemeral containers require the pods/ephemeralcontainers subresource permission. This is separate from pods/exec, which controls kubectl exec. A common mistake is to grant pods/exec for debugging purposes without realizing that ephemeral containers require an additional grant:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: ephemeral-debugger
rules:
- apiGroups: [""]
  resources: ["pods/ephemeralcontainers"]
  verbs: ["update", "patch"]
- apiGroups: [""]
  resources: ["pods/attach"]
  verbs: ["create", "get"]
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list"]

In production environments, this permission should be tightly scoped: time-limited via RoleBinding rather than permanent ClusterRoleBinding, restricted to specific namespaces, and ideally gated behind an approval workflow. The debug container runs as root by default, which can create privilege escalation paths if the application container runs as a non-root user with shared process namespace — the debug container can attach to the application’s processes with higher privileges.

Option 2: kubectl debug –copy-to (Pod Copy Strategy)

When you need to modify the pod’s container spec — replace the image, change environment variables, add a sidecar with a shared filesystem — the --copy-to flag creates a full copy of the pod with your modifications applied:

kubectl debug my-pod \
  -it \
  --copy-to=my-pod-debug \
  --image=my-app:debug \
  --share-processes

This creates a new pod named my-pod-debug that is a copy of my-pod but with the container image replaced by my-app:debug. If my-app:debug is your application image built with debug tooling included (or a debug variant from your registry), this lets you interact with the exact same binary in the exact same configuration as the original pod.

A more common use of --copy-to is to attach a debug container alongside the existing application container while keeping the original image unchanged:

kubectl debug my-pod \
  -it \
  --copy-to=my-pod-debug \
  --image=busybox \
  --share-processes \
  --container=debugger

This creates the copy-pod with both the original containers and a new debugger container sharing the process namespace. Unlike ephemeral containers, this approach supports volume mounts and resource limits, and the debug pod can be deleted cleanly when you are done.

Limitations of the Copy Strategy

The pod copy approach has a critical limitation: it is not debugging the original pod. It creates a new pod that may behave differently because:

  • It does not share the original pod’s in-memory state — if the issue is a goroutine leak or heap corruption that has been accumulating for hours, the fresh copy will not exhibit it immediately
  • It creates a new Pod UID, which means any admission webhooks, network policies, or pod-level security contexts that depend on pod identity may apply differently
  • If the original pod is crashing (CrashLoopBackOff), the copy will also crash — this technique does not help for crash debugging unless you also change the entrypoint

For crash debugging specifically, combine --copy-to with a modified entrypoint to keep the container alive:

kubectl debug my-crashing-pod \
  -it \
  --copy-to=my-pod-debug \
  --image=busybox \
  --share-processes \
  -- sleep 3600

Option 3: Debug Image Variants

The most pragmatic approach — and the one most appropriate for developer workflows — is to maintain a debug variant of your application image that includes shell tooling. Both the Google distroless project and Chainguard provide this pattern officially.

Google distroless images have a :debug tag that adds BusyBox to the image:

# Production image
FROM gcr.io/distroless/java17-debian12

# Debug variant — identical but with BusyBox shell
FROM gcr.io/distroless/java17-debian12:debug

Chainguard images follow a similar convention with :latest-dev variants that include apk, a shell, and common utilities:

# Production (zero shell, minimal footprint)
FROM cgr.dev/chainguard/go:latest

# Development/debug variant
FROM cgr.dev/chainguard/go:latest-dev

If you build your own base images, the recommended approach is to use multi-stage builds and maintain separate build targets:

FROM golang:1.22 AS builder
WORKDIR /app
COPY . .
RUN go build -o myapp .

# Production: static distroless image
FROM gcr.io/distroless/static-debian12 AS production
COPY --from=builder /app/myapp /myapp
ENTRYPOINT ["/myapp"]

# Debug variant: same binary, with shell tools
FROM gcr.io/distroless/static-debian12:debug AS debug
COPY --from=builder /app/myapp /myapp
ENTRYPOINT ["/myapp"]

In your CI/CD pipeline, build both targets and push my-app:${VERSION} (production) and my-app:${VERSION}-debug (debug variant) to your registry. The debug image is never deployed to production by default, but it exists and is ready to be used with kubectl debug --copy-to when needed.

Security Considerations for Debug Variants

Debug image variants defeat much of the security benefit of distroless if they are used in production, even temporarily. Track usage carefully: log when debug images are deployed, require explicit approval, and ensure they are removed after the debugging session. In regulated environments, consider whether deploying a debug variant to production namespaces is permitted by your security policy — in many cases it is not, and you must use ephemeral containers (which add a debug process to the pod without modifying the application image) instead.

Option 4: cdebug

cdebug is an open-source CLI tool that simplifies distroless debugging by wrapping kubectl debug with more ergonomic defaults and additional capabilities. Its primary value is in making ephemeral container debugging feel like a native shell experience:

# Install
brew install cdebug
# or: go install github.com/iximiuz/cdebug@latest

# Debug a running pod
cdebug exec -it my-pod

# Specify a namespace and container
cdebug exec -it -n production my-pod -c my-container

# Use a specific debug image
cdebug exec -it my-pod --image=nicolaka/netshoot

What cdebug adds over raw kubectl debug:

  • Automatic filesystem chroot. cdebug exec automatically sets the filesystem root of the debug container to the target container’s filesystem, so you browse / and see the application’s files — not the debug image’s files. This addresses the most common friction point with kubectl debug.
  • Docker integration. cdebug exec works identically for Docker containers (cdebug exec -it <container-id>), making it the same muscle memory for local and cluster debugging.
  • No RBAC complications for Docker-based local development — useful for developer workflows before the code reaches Kubernetes.

The tradeoff: cdebug is a third-party dependency and requires installation. In environments with strict tooling policies (regulated industries, air-gapped clusters), it may not be an option. In those cases, the raw kubectl debug workflow with /proc/1/root filesystem navigation is the baseline.

Option 5: Node-Level Debugging

When everything else fails — the pod is in CrashLoopBackOff too fast to attach to, the issue is a kernel-level problem, or you need tools like strace that require elevated privileges — node-level debugging gives you direct access to the container’s processes from the host node.

kubectl debug node/ creates a privileged pod on the target node that mounts the node’s root filesystem under /host:

kubectl debug node/my-node-name \
  -it \
  --image=nicolaka/netshoot

From this privileged pod, you can use nsenter to enter the namespaces of any container running on the node:

# Find the container's PID on the node
# (from within the node debug pod)
crictl ps | grep my-container
crictl inspect <container-id> | grep pid

# Enter the container's namespaces
nsenter -t <pid> -m -u -i -n -p -- /bin/sh

# Or just the network namespace (for network debugging)
nsenter -t <pid> -n -- ip a

The nsenter approach lets you run tools from the node’s or debug container’s toolset while operating in the namespaces of the target container. This is how you run strace against a distroless process: strace is not in the application container, but you can run it from the node level while targeting the application’s PID.

# Trace all syscalls from the application process
nsenter -t <pid> -- strace -p <pid> -f -e trace=network

RBAC and Security for Node Debugging

Node-level debugging requires nodes/proxy and the ability to create privileged pods, which in most production clusters is restricted to cluster administrators. The debug pod runs with hostPID: true and hostNetwork: true, giving it visibility into all processes and network traffic on the node — not just the target container. This is significant: every process running on the node, including those in other tenants’ namespaces, is visible.

This technique should be treated as a break-glass procedure: log the access, require dual approval in production environments, and clean up immediately after the debugging session with kubectl delete pod --selector=app=node-debugger.

Choosing the Right Approach: Access Profile and Environment Matrix

The technique you should use depends on two axes: who you are (developer, platform engineer, ops/SRE) and where the issue is (local development, staging, production). The requirements and constraints differ significantly across these combinations.

Developer — Local or Development Cluster

Goal: Reproduce and understand a bug, inspect configuration, verify network connectivity to services.
Constraints: None material — full cluster admin on local or personal dev namespace.
Recommended approach: Debug image variants or cdebug.

In local development (Minikube, Kind, Docker Desktop), the fastest path is to build the debug variant of your image and deploy it directly. If you are working with another team’s service, cdebug exec gives you a shell in the container with automatic filesystem root without any special RBAC. The goal is speed and iteration — reserve the more structured approaches for higher environments.

Developer — Staging Cluster

Goal: Debug integration issues, inspect live configuration, verify environment-specific behavior.
Constraints: Shared cluster — cannot deploy arbitrary workloads to other teams’ namespaces, but has pods/ephemeralcontainers in own namespace.
Recommended approach: kubectl debug with ephemeral containers (--target), scoped to own namespace.

Staging is where ephemeral containers earn their keep. You can attach to a running pod without restarting it, without modifying the deployment spec, and without affecting other users of the same cluster. Grant developers pods/ephemeralcontainers in their team’s namespaces and they can self-service debug without needing ops involvement.

Platform Engineer / SRE — Production

Goal: Diagnose a live production incident. The pod is behaving unexpectedly — high latency, memory growth, unexpected connections, incorrect responses.
Constraints: Changes to running pods are high-risk. Any debug image deployment must be gated. The issue is live and affecting users.
Recommended approach: kubectl debug with ephemeral containers (ephemeral containers do not restart the pod, do not modify the deployment, and are auditable via API audit logs).

The key production requirements are auditability and minimal blast radius. Ephemeral containers satisfy both: they are recorded in the Kubernetes API audit log (who attached, when, to which pod), they do not modify the running application container, and they are limited to the pod’s own network and process namespaces. Document the debug session in your incident ticket: pod name, time, what was observed, who ran the debug container.

The --copy-to strategy is generally inappropriate for production incident response: it creates a new pod that may or may not exhibit the issue, it adds load to the cluster during an incident, and if it is attached to the same services (databases, downstream APIs), it produces additional traffic that complicates forensics.

Platform Engineer — Production, Node-Level Issue

Goal: Diagnose a kernel-level issue, a container runtime problem, a networking issue that spans multiple pods, or a situation where the pod is crashing too fast to attach to.
Constraints: Maximum privilege required. High operational risk.
Recommended approach: Node-level debug pod with nsenter. Treat as break-glass.

For this scenario, create a dedicated RBAC role that grants nodes/proxy access and the ability to create pods with hostPID: true in a dedicated debug namespace. Bind it only to specific users, require a separate authentication step (e.g., kubectl auth can-i check against a time-limited binding), and log all access. This level of access should generate a PagerDuty-style alert so that the security team knows a privileged debug session is active in production.

Common Errors and Solutions

Error: “ephemeral containers are disabled for this cluster”

Ephemeral containers require Kubernetes 1.16+ (alpha, behind feature gate) and are stable from 1.25. If you are on 1.16–1.22, you need to enable the EphemeralContainers feature gate on the API server and kubelet. From 1.23 it was beta and enabled by default. From 1.25 it is stable and always on. On managed Kubernetes services (EKS, GKE, AKS), check the cluster version — versions older than 1.25 may still have it disabled depending on your configuration.

Error: “cannot update ephemeralcontainers” (RBAC)

You have pods/exec but not pods/ephemeralcontainers. Add the grant shown in the RBAC section above. Note that pods/exec and pods/ephemeralcontainers are separate subresources — having one does not imply the other.

Error: “container not found” with –target

The container name in --target must match exactly the container name as defined in the Pod spec — not the image name. Check with kubectl get pod my-pod -o jsonpath='{.spec.containers[*].name}' to get the exact container names.

Error: Can see processes but cannot read /proc/1/root

The application container runs as a non-root user (e.g., UID 1000) and the ephemeral container runs as root. The application’s filesystem may have files owned by UID 1000 that are not readable by other UIDs depending on permissions. The /proc/<pid>/root path itself requires CAP_SYS_PTRACE capability. If your cluster’s PodSecurityStandards (PSS) are set to restricted, the debug container may not have this capability. Use the Baseline PSS profile for debug namespaces or explicitly add SYS_PTRACE to the ephemeral container’s securityContext.

Error: tcpdump shows no traffic

When using nicolaka/netshoot for network debugging, ensure the ephemeral container is created without --target if your goal is to capture all traffic on the pod’s network interface (not just the specific container’s process). With --target, you share the process namespace but the network namespace is shared at the pod level regardless. Run tcpdump -i any to capture on all interfaces including loopback, which is where inter-container traffic within a pod travels.

Decision Framework

Use this as a starting point to select the right technique for your situation:

ScenarioTechniqueRequirement
Active production incident, pod runningkubectl debug + ephemeral containerpods/ephemeralcontainers RBAC, k8s 1.25+
Pod crashing too fast to attachkubectl debug –copy-to + modified entrypointAbility to create pods in namespace
Developer debugging in dev/stagingcdebug exec or kubectl debugpods/ephemeralcontainers or pod create
Need full filesystem accesskubectl debug –copy-to + debug image variantDebug image in registry, pod create
Need strace or kernel tracingNode-level debug with nsenternodes/proxy, cluster admin equivalent
Network packet capturekubectl debug + nicolaka/netshootpods/ephemeralcontainers
Local Docker debuggingcdebug exec <container-id>Docker socket access
CI-reproducible debug environmentDebug image variant in separate build targetSeparate image tag in registry

Production RBAC Design

A clean RBAC design for production distroless debugging separates three roles with different privilege levels:

# Tier 1: Developer self-service in team namespaces
# Allows attaching ephemeral containers, no node access
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: distroless-debugger
  namespace: team-namespace
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list"]
- apiGroups: [""]
  resources: ["pods/ephemeralcontainers"]
  verbs: ["update", "patch"]
- apiGroups: [""]
  resources: ["pods/attach"]
  verbs: ["create", "get"]
---
# Tier 2: SRE production incident access
# Ephemeral containers across all namespaces
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: sre-distroless-debugger
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list"]
- apiGroups: [""]
  resources: ["pods/ephemeralcontainers"]
  verbs: ["update", "patch"]
- apiGroups: [""]
  resources: ["pods/attach"]
  verbs: ["create", "get"]
---
# Tier 3: Break-glass node access
# Only for platform team, time-limited binding recommended
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: node-debugger
rules:
- apiGroups: [""]
  resources: ["nodes/proxy"]
  verbs: ["get"]
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["create", "get", "list", "delete"]
  # Restrict to debug namespace via RoleBinding, not ClusterRoleBinding

Bind Tier 1 permanently to your developers. Bind Tier 2 to SREs permanently but with audit alerts on use. Bind Tier 3 only on-demand (via a Kubernetes operator that creates time-limited RoleBindings) and never as a permanent ClusterRoleBinding.

Summary

Distroless containers are the correct choice for production workloads. They reduce attack surface, eliminate unnecessary CVEs, and force a cleaner separation between application and tooling. The operational cost is that your traditional debugging workflow — exec into the container, run some commands — no longer works by default.

Kubernetes provides a clean answer with ephemeral containers and kubectl debug: inject a debug container with whatever tools you need into the running pod, sharing its network and process namespaces, without restarting or modifying the application. For scenarios where ephemeral containers are insufficient — filesystem access, crash debugging, kernel-level investigation — the copy strategy and node-level debug fill the remaining gaps.

The key to making this work at scale is not the technique itself but the access model: developers get self-service ephemeral container access in their own namespaces, SREs get cluster-wide ephemeral container access for production incidents, and node-level access is a break-glass procedure with audit trail and time limits. With that model in place, distroless becomes an operational non-issue rather than an obstacle.

Multi-Stage Dockerfiles Explained: Reduce Docker Image Size the Right Way

Multi-Stage Dockerfiles Explained: Reduce Docker Image Size the Right Way

Multi-Stage Dockerfile is the pattern you can use to ensure that your docker image is at an optimized size. We already have covered the importance of keeping the size of your docker image at a minimum level and what tools you could use, such as dive, to understand the size of each of your layers. But today, we are going to follow a different approach and that approach is a multi-stage build for our docker containers.

What is a Multi-Stage Dockerfile Pattern?

The multi-Stage Dockerfile is based on the principle that the same Dockerfile can have different FROM sentences and each of the FROM sentences starts a new stage of the build.

Multi-Stage Dockerfile Pattern

Why Multi-Stage Build Pattern Helps Reducing The Size of Container Images?

The main reason the usage of multi-stage build patterns helps reduce the size of the containers is that you can copy any artifact or set of artifacts from one stage to the other. And that is the most important reason. Why? Because that means that everything you do not copy is discarded and you are not carrying all these not required components from layer to layer and generating a bigger unneeded size of the final Docker image.

How do you define a Multi-Stage Dockerfile

First, you need to have a Dockerfile with more than one FROM. As commented, each of the FROM will indicate the start of one stage of the multi-stage dockerfile. To differentiate them or reference them, you can name each of the stages of the Dockerfile by using the clause AS alongside the FROM command, as shown below:

 FROM eclipse-temurin:11-jre-alpine AS builder

As a best practice, you can also add a new label stage with the same name you provided before, but that is not required. So, in a nutshell, a Multi-Stage Dockerfile will be something like this:

FROM eclipse-temurin:11-jre-alpine AS builder
LABEL stage=builder
COPY . /
RUN apk add  --no-cache unzip zip && zip -qq -d /resources/bwce-runtime/bwce-runtime-2.7.2.zip "tibco.home/tibcojre64/*"
RUN unzip -qq /resources/bwce-runtime/bwce*.zip -d /tmp && rm -rf /resources/bwce-runtime/bwce*.zip 2> /dev/null


FROM  eclipse-temurin:11-jre-alpine 
RUN addgroup -S bwcegroup && adduser -S bwce -G bwcegroup

How do you copy resources from one stage to another?

This is the other important part here. Once we have defined all the stages we need, and each is doing its part of the job, we need to move data from one stage to the next. So, how can we do that?

The answer is by using the command COPY. COPY is the same command you use to move data from your local storage to the container image, so you will need a way to differentiate that this time you are not copying it from your local storage but another stage, and here is where we are going to use the argument --from. The value will be the name of the stage we learned in the previous section to declare. So a complete COPYcommand will be something like the snippet shown below:

 COPY --from=builder /resources/ /resources/

What is the Improvement you can get?

That is the essential part and will depend on how your Dockerfiles and images are created, but the primary factor you can consider is the number of layers your current image has. The bigger the number of layers, the more significant that you can probably save on the amount of the final container image in a multi-stage dockerfile.

The main reason is that each layer will duplicate part of the data, and I am sure you will not need all of the layer’s data in the next one. And using the approach comments in this article, you will get a way to optimize it.

 Where can I read more about this?

If you want to read more, you would need to know that the multi-stage dockerfile is documented as one of the best practices on the Docker official web page, and they have a great article about this by Alex Ellis that you can read here.

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.

Hadolint Explained: Dockerfile Best Practices Using CLI, Docker, and VS Code

Hadolint Explained: Dockerfile Best Practices Using CLI, Docker, and VS Code

Introduction

Hadolint is an open-source tool that will help you ensure that all the Dockerfiles you create follow all the Dockerfile best practices available in an automated way. Hadolint, as the number already suggested, is a linter tool and, because of that, can also help you to teach you all these best practices when creating Dockerfiles yourself. We already talked about it the optimization of container image size, but today we are going to try to cover it more in-depth.

Hadolint is a smaller tool written in Haskell that parses the Dockerfile into an AST and performs rules on top of the AST. It stands on the shoulders of ShellCheck to lint the Bash code inside RUN instructions, as shown in the picture below:

Hadolint Explained: Dockerfile Best Practices Using CLI, Docker, and VS Code

There are several ways to run the tool, depending on what you try to achieve, and we will talk a little bit about the different options.

Running it as a standalone tool

This is the first way we can run it as a complete standalone tool that you can download from here , and it will need to do the following command.

 hadolint <Dockerfile path>

It will run against it and show any issue that is found, as you can see in the picture below:

Hadolint execution

For each of the issues found, it will show the line where the problem is detected, the code of the Dockerfile best practice check that is being performed (DL3020), the severity of the check (error, warn, info, and so on), and the description of the issue.

To see all the rules that are being executed, you can check them in the GitHub Wiki , and all of them are based on the Dockerfile best practices published directly from Docker on its official web page here.

For each of them, you will find a specific wiki page with all the information you need about the issue and why this is something that should be changed, and how it should be changed, as you can see in the picture below:

Hadolint GitHub Wiki page

Ignore Rules Capability

You can ignore some rules if you don’t want them to be applied because there are some false-positive or just because the checks are not aligned with the Dockerfile best practices used in your organization. To do that, you can include an —ignore parameter with the rule to be applied:

 hadolint --ignore DL3003 --ignore DL3006 <Dockerfile>

Running it as Docker Container

Also, the tool is available as a Docker container in the following repos:

docker pull hadolint/hadolint
# OR
docker pull ghcr.io/hadolint/hadolint

And this will help you to be introduced to your Continuous Integration and Continuous Deployment or just to be used in your local environment if you prefer not to install software locally.

 Running it inside VS Code

Like many linters, it is essential to have it close to your development environment; this time is nothing different. We would like to have the Dockerfile best practice relative to the editor while we are typing for two main reasons:

  • As soon as you get the issue, you will fix it faster so the code always will have better quality
  • As soon as you know of the issue, you will not make it again in newer developments.

You will have a Hadolint as part of the Extensions: Marketplace, and you can install it:

Hadolint VS Code Extension


Once you have that done, each time you open a Dockerfile, you will validate against all these Dockerfile best practices, and it will show the issues detected in the Problems view, as you can see in the picture below:

Hadolint: VS Code Extension Execution

And those issues will be re-evaluated as soon as you modify and save the Dockerfile again, so you will always see the live version of the problem detected against the Dockerfile best practices.

From Docker Desktop to Rancher Desktop: Simple Migration Guide for Developers

From Docker Desktop to Rancher Desktop: Simple Migration Guide for Developers

As most of you already know, the 31st of January is the last day to use Docker Desktop without applying the new licensing model that pretty much generates a cost for any company usage. Of course, it is still free to use for open-source and small companies, but it is better to meet the requirements using Docker official documentation.

So because of that situation, I started a journey to find an alternative to Docker Desktop because I used docker-desktop a lot. The primary use I do is to startup server-like things for temporary usage that I don’t like to have installed in my machine to keep it as clean as possible (even though this is not always true, but it is an attempt).

So, on that search, I discovered Rancher Desktop was released not a long time ago and promised to be the most suitable alternative. The goal of this post is not to compare both platforms, but if you like to have more information I leave here a post that can provide it to you:

The idea here is to talk more about the journey of that migration. So I installed the Rancher Desktop 1.0.0 on my Mac and the installation was very, very easy. The main difference with Docker Desktop is that Rancher Desktop is built with Kubernetes in mind and for Docker Desktop, that came as an afterthought. So, by default we will have a Kubernetes environment running in our system, and we can even select the version of that cluster as you can see in the picture below:

But also in Rancher, they noticed the opportunity window they have in front of them, and they were very aggressive in providing an easy migration path from Docker Desktop. And the first thing you will notice is that you can configure Docker Desktop to be compliant with the Docker CLI API as you can see in the picture below.

This is not enabled by default, but it is very easy to do it and it will make you not need to change all your “docker-like” commands (docker build, docker ps.. ) so it will smooth a lot of the transition.

Maybe in the future, you want to move away from everything resembling docker even at the client-side and move to a Containers kind of approach, but for now, what I needed is to simplify the process.

So, after enabling that and restarting my Rancher Desktop, I can type my commands as you can see in the picture below:

So, the only thing I need to do is migrate my images and containers. Because I’m not a pure docker usage, I don’t follow sometimes the thing to have your container stateless and using volumes especially when you are doing a small use for some time and that’s it. So, that means that some of my containers also need to be moved to the new platform to avoid any data loss.

So, my migration journey had different steps:

  • First of all, I will commit the stateful containers that I need to keep on the new system using the command docker commit with the documentation that you can find here:
  • Then, I will export all the images that I have now in TAR files using the command docker save with the documentation that you can find it here:
  • And finally, I will load all those images on the new system using docker load command to have it available there. Again, you can find the documentation of that specific command here

To automate a little bit the process even that I don’t have much images loaded because I try to clean up from time to time using the docker system prune command:

I prefer not to do it manually, so I will use some simple scripts to do the job.

So, to perform the export job I need to run the following command:

docker image ls -q | xargs -I {} docker image save {} -o {}.tar

This script will save to have all my images on different tar files into a specific folder. Now, I just need to run the following command from the same folder I had run the previous one to have all the images back into the new system:

find . -name "*.tar" -exec docker load -i {} \;

The reason why I’m not doing both actions at the same time is that I need to have running Docker Desktop for the first part and Rancher Desktop for the other. So even though I can automate that as well, I think it is not worth it.

And that’s it, now I can remove the Docker Desktop from my laptop, and my life will continue to be the same. I will try to provide more feedback on how it feels, especially regarding resource utilization and similar topics in the near future.

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.

Reduce Docker Disk Usage Locally: Analyze and Clean Up Images, Containers, and Cache

Reduce Docker Disk Usage Locally: Analyze and Clean Up Images, Containers, and Cache

Discover new options that you have at your disposal to do efficient disk usage in your Docker installation

The rise of the container has been a game-changer for all of us, not only at the server-side where pretty much any new workload that we deploy is deployed in a container form but also in our local environments happens same change.

We embrace containers to easily manage the different dependencies that we need to handle as developers. Even if the task at hand was not a container-related thing. Do you need a Database up & running? You use a containerized version of it. Do you need a Messaging System to test some of your applications? You quickly start a container providing that functionality.

And as soon as you don’t need them, those are killed, and your system is still as clean as it was before starting this task. But there are always things that we need to handle even when we have a wonderful solution in front of us, and in the case of a local docker environment, Disk Usage is one of the most critical ones.

This process of launching new things over and over and then we get rid of them is true in some way because all these images that we have needed and all these containers we have launched are still there in our system waiting for a new round and during that time using our disk resources as you can see in a current picture of my local Docker environment with more than 60 GB used for that purpose.

Reduce Docker Disk Usage Locally: Analyze and Clean Up Images, Containers, and Cache
Docker dashboard settings page image showing the amount of disk Docker is using.

The first thing we need to do is to check what is using this amount of space to see if we can release some of them. To do that, we can leverage on the docker system df command the docker CLI provides to us:

Reduce Docker Disk Usage Locally: Analyze and Clean Up Images, Containers, and Cache
The output of the execution of the docker system df command

As you can see, the 61 GB that is in use are 20.88 GB per the images that I have in use, 21.03 MB just for the containers that I have defined, 1.25 GB for the local volumes 21.07 for the build cache. As I only have active 18 of the 26 images defined I can reclaim up to 9.3 GB that is an important amount.

If we would like to get more details about this data, we can always use the verbose option as an append to the command, as you can see in the picture below:

Reduce Docker Disk Usage Locally: Analyze and Clean Up Images, Containers, and Cache
Detailed verbose output of the docker system df -v command

So, after getting all this information, we can go ahead and execute a prune of your system. This activity will get rid of any unused container and image that you have in your system, and to execute that, you only need to type this:

docker system prune -af

It has several options to turn a little bit the execution that you can check on the Docker Oficial web page :

In my case, that help me to recover up to 40.8 GB of my system, as you can see in the picture below.

Reduce Docker Disk Usage Locally: Analyze and Clean Up Images, Containers, and Cache

But if you would like to move one step ahead, you can also tune some properties to consider where you are executing this prune. For example, the defaultKeepStorage will help you define how much disk you want to use for this build-cache system to optimize the amount of network usage you do when building images with common layers.

To do that, you need to have the following snippet in your Docker Engine configuration section, as shown in the image below:

Reduce Docker Disk Usage Locally: Analyze and Clean Up Images, Containers, and Cache
Docker Engine configuration with the defaultKeepStorage up to 20GB

I hope that all this housekeeping process will help your location environments to shine again and get the most of it without needing to waste a lot of resources in the process

Portainer Explained: Evolution, Use Cases, and Why It Still Matters for Kubernetes

Portainer Explained: Evolution, Use Cases, and Why It Still Matters for Kubernetes

Discover the current state of the first graphical interfaces for docker containers and how it provides a solution for modern container platforms

I want to start this article with a story that I am not sure all of you, incredible readers, know. It was a time that there were no graphical interfaces to monitor your containers. It was a long time ago, understanding a long time as we can do in the container world. Maybe this was 2014-2015 when Kubernetes was in its initial stage, and also, Docker Swarm was just released and seemed the most reliable solution.

So most of us didn’t have a container platform as such. We just run our containers from our own laptops or small servers for cutting-edge companies using docker commands directly and without more help than the CLI tool. As you can see, things have changed a lot since then, and if you would like to refresh that view, you can check the article shared below:

And at that time, an open-source project provides the most incredible solution because we didn’t know that we needed that until we use it, and that option was portainer. Portainer provides a very awesome web interface where you can see all the docker containers deployed on your docker host and deploy as another platform.

Portainer: A Visionary Software and an Evolution Journey
Web page of portainer in 2017 from https://ostechnix.com/portainer-an-easiest-way-to-manage-docker/

It was the first one and generated a tremendous impact, even generated a series of other projects that were named: the portainer of… like dodo the portainer of Kubernetes infrastructure at that time.

But maybe you can ask.. and how is portainer doing? is still portainer a thing? It is still alive and kicking, as you can see on their GitHub project page: https://github.com/portainer/portainer, with the last release in the last of May 2021.

Now they have a Business version but still as Comunity Edition one that is the one that I am going to be analyzing here in more detail in another article. Still, I would like to provide some initial highlights:

  • Installing process still follows the same approach as the initial releases to be another component of your cluster. The options to be used in Docker, Docker Swarm, or Kubernetes cover all the main solutions all enterprise uses.
  • Provides now a list of application templates similar to the Openshift Catalog list, and also, you can create your own ones. This is very useful for companies that usually rely on these templates to allow developers to use a common deployment approach without needing to do all the work.
Portainer Explained: Evolution, Use Cases, and Why It Still Matters for Kubernetes
Portainer 2.5.1 Application Template view
  • Team Management capabilities can define users with access to the platform and group those users as part of the team to a more granular permission management.
  • Multi-registry support: By default, it will be integrated with Docker Hub, but you can add your own registries as well and be able to pull images directly from those directly from the GUI.

In summary, this is a great evolution of the portainer tool while keeping the same spirit that all the old users loved at that time: Simplicity and Focus on what an Administrator or Developer needs to know, but also adding more features and capabilities to keep the pace of the evolution in the container platform industry.

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.

Scan Docker Images Locally for Vulnerabilities Using Snyk (DevSecOps Guide)

Scan Docker Images Locally for Vulnerabilities Using Snyk (DevSecOps Guide)

Learn how you can leverage the use of Snyk inside your Docker engine installation

Security is the most relevant topic in modern architecture. It needs to be handled from all different perspectives. Having a single team auditing the platforms and the developments that we built is not enough.

The introduction of DevSecOps as the new normal, including the security teams and policies being part of the development process to avoid security becoming a blocker of innovation and make sure that the artifacts we deploy are secured, have made this clear.

Docker image scanning is one of the most important topics we can cover regarding the container images to know that all the internal components that are part of the image are safe from vulnerabilities. We usually rely on some systems to do so.

I wrote an article regarding the usage of one of the most relevant options (Harbor) from the open source world to do this job.

And this is also being done by different Docker repositories from cloud providers like Amazon ECR as of this year. But why do we need to wait until we push the images to an external Docker registry? Why can’t we do it in our local environment?

Now we can. Version 2.5.0.1 of the Docker engine also includes the Snyk components needed to inspect the Docker images directly from the command line:

https://www.docker.com/blog/combining-snyk-scans-in-docker-desktop-and-docker-hub-to-deploy-secure-containers/


Scanning Your Local Images

So, let’s start. Let’s open a new terminal and type the following command:

docker scan <image-name>

As soon as we type this, the command will tell us that this scanning process will use Snyk to do that and we need to authorize access to those services to do the scanning process.

After that, we get a list of all the vulnerabilities detected, as you can see in the picture below:

Vulnerability scanning
Vulnerability scanning using your local Docker client

For each of the vulnerabilities, you can see the following data:

Vulnerability info
Detailed information provided for each of the vulnerabilities detected

We get the library with the vulnerability, the severity level, and a short description of it. If you need more details, you can also check the provided URL that is linked to a description page for that vulnerability:

Vulnerabilities page
Vulnerability detailed page from snyk

Finally, it also provides the sources introducing this library in your image so this can be solved quickly.

It provides a high-level view of the whole image too, as you can see here:

Overview of Docker images
Overview of your Docker images with all the vulnerabilities detected

So, now you don’t have any excuse to not have all your images safe and secure before pushing to your local repository. Let’s do it!

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.

How to Analyze and Reduce Docker Image Size (Layer Analysis and Best Practices)

How to Analyze and Reduce Docker Image Size (Layer Analysis and Best Practices)

Find how you can improve the size of your Docker images for a better experience and savings inside your organization.

Containerization is the new normal. We are all aware of that. All the new versions of the corporate software and all the open-source projects are including the options to use a docker image to run their software.

Probably you already have been doing your tests or even running in production workloads based on docker images that you have built yourself. If that is the case, you probably know one of the big challenges when you’re doing this kind of task: How to optimize the size of the image you generate?

One of the main reasons the docker image can be so big is because they are built following a layered concept. And that means that each of the images is being created as the addition of layers, each associated with the different commands you have in your Dockerfile.

How to Analyze and Reduce Docker Image Size (Layer Analysis and Best Practices)
Graphical explanation of how a Docker image is compounded.

Use dive to analyze the size of your images

dive is an open-source project that provides a detailed view of the composition of your docker images. It works as a command-line interface application that has a great view of the content of the layers, as you can see in the picture below:

How to Analyze and Reduce Docker Image Size (Layer Analysis and Best Practices)
Dive execution of a BusinessWorks Container Edition image

The tool follows an n-curses interface (if you are old enough to remember how tools were before Graphical User Interfaces was a thing; it should look familiar) and has these main features:

  • This tool will provide the list of layers in the top-left of the screen and the size associated with each of them.
  • Provides general stats about image efficiency (a percentage value), a potential view of the wasted size, and the image’s total size.
  • For each of the layers selected, you get a view on the file system for this view with the data of each folder’s size.
  • Also, get a view of the bigger elements and the number of replication of these objects.

Now, you have a tool that will help you first to know how your image is built and get performance data of each of the tweaks that you do to improve that size. So, let’s start with the tricks.

1.- Clean your image!

This first is quite obvious, but that doesn’t mean that it is not important. Usually, when you create a Docker image, you follow the same pattern:

  • You declare a base image to leverage on.
  • You add resources to do some work.
  • You do some work.

Usually, we forget an additional step: To clean the added resources when they are not needed anymore! So, it is important to be sure that we remove each of the files that we don’t need anymore.

This also applies to other components like the apt cache when we are installing a new package that we need or any temporary folder that we need to perform an installation or some work to build the image.

2.- Be careful about how you create your Dockerfile

As we already mentioned, each of the commands that we declare in our Dockerfile generates a new layer. So, it is important to be very careful with the lines that we have in the Dockerfile. Even if this is a tradeoff regarding the readability of the Dockerfile, it is important to try to merge commands in the same RUN primitive to make sure we are not creating additional layers.

How to Analyze and Reduce Docker Image Size (Layer Analysis and Best Practices)
Sample for a Dockerfile with merged commands

You can also use Docker linters like Hadolint that will help you with this and other anti-patterns that you should avoid when you are creating a Dockerfile.

3.- Go for docker build — squash

The latest versions of the Docker engine provide a new option when you build your images to create with the minimized size squashing of the intermediate layers that can be created as part of the Dockerfile creation process.

That works, providing a new flag when you are doing the build of your image. So, instead of doing this:

docker build -t <your-image-name>:<tag> <Dockerfile location>

You should use an additional flag:

docker build --squash -t <your-image-name>:<tag> <Dockerfile location>

To be able to use this option, you should enable the experimental features on your Docker Engine. To do that, you need to enable that in your daemon.json file and restart the engine. If you are using Docker for Windows or Docker for Mac, you can do it using the user interface as shown below:

How to Analyze and Reduce Docker Image Size (Layer Analysis and Best Practices)

Summary

These tweaks will help you make your Docker images thinner and much more pleasant the process of pulling and pushing and, at the same time, even saving some money regarding the storage of the images in the repository of your choice. And not only for you but for many others that can leverage the work that you are doing. So think about yourself but also think about the community.

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.