Multi-Stage Dockerfiles Explained: Reduce Docker Image Size the Right Way

Multi-Stage Dockerfiles Explained: Reduce Docker Image Size the Right Way

Multi-Stage Dockerfile is the pattern you can use to ensure that your docker image is at an optimized size. We already have covered the importance of keeping the size of your docker image at a minimum level and what tools you could use, such as dive, to understand the size of each of your layers. But today, we are going to follow a different approach and that approach is a multi-stage build for our docker containers.

What is a Multi-Stage Dockerfile Pattern?

The multi-Stage Dockerfile is based on the principle that the same Dockerfile can have different FROM sentences and each of the FROM sentences starts a new stage of the build.

Multi-Stage Dockerfile Pattern

Why Multi-Stage Build Pattern Helps Reducing The Size of Container Images?

The main reason the usage of multi-stage build patterns helps reduce the size of the containers is that you can copy any artifact or set of artifacts from one stage to the other. And that is the most important reason. Why? Because that means that everything you do not copy is discarded and you are not carrying all these not required components from layer to layer and generating a bigger unneeded size of the final Docker image.

How do you define a Multi-Stage Dockerfile

First, you need to have a Dockerfile with more than one FROM. As commented, each of the FROM will indicate the start of one stage of the multi-stage dockerfile. To differentiate them or reference them, you can name each of the stages of the Dockerfile by using the clause AS alongside the FROM command, as shown below:

 FROM eclipse-temurin:11-jre-alpine AS builder

As a best practice, you can also add a new label stage with the same name you provided before, but that is not required. So, in a nutshell, a Multi-Stage Dockerfile will be something like this:

FROM eclipse-temurin:11-jre-alpine AS builder
LABEL stage=builder
COPY . /
RUN apk add  --no-cache unzip zip && zip -qq -d /resources/bwce-runtime/bwce-runtime-2.7.2.zip "tibco.home/tibcojre64/*"
RUN unzip -qq /resources/bwce-runtime/bwce*.zip -d /tmp && rm -rf /resources/bwce-runtime/bwce*.zip 2> /dev/null


FROM  eclipse-temurin:11-jre-alpine 
RUN addgroup -S bwcegroup && adduser -S bwce -G bwcegroup

How do you copy resources from one stage to another?

This is the other important part here. Once we have defined all the stages we need, and each is doing its part of the job, we need to move data from one stage to the next. So, how can we do that?

The answer is by using the command COPY. COPY is the same command you use to move data from your local storage to the container image, so you will need a way to differentiate that this time you are not copying it from your local storage but another stage, and here is where we are going to use the argument --from. The value will be the name of the stage we learned in the previous section to declare. So a complete COPYcommand will be something like the snippet shown below:

 COPY --from=builder /resources/ /resources/

What is the Improvement you can get?

That is the essential part and will depend on how your Dockerfiles and images are created, but the primary factor you can consider is the number of layers your current image has. The bigger the number of layers, the more significant that you can probably save on the amount of the final container image in a multi-stage dockerfile.

The main reason is that each layer will duplicate part of the data, and I am sure you will not need all of the layer’s data in the next one. And using the approach comments in this article, you will get a way to optimize it.

 Where can I read more about this?

If you want to read more, you would need to know that the multi-stage dockerfile is documented as one of the best practices on the Docker official web page, and they have a great article about this by Alex Ellis that you can read here.

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.

Hadolint Explained: Dockerfile Best Practices Using CLI, Docker, and VS Code

Hadolint Explained: Dockerfile Best Practices Using CLI, Docker, and VS Code

Introduction

Hadolint is an open-source tool that will help you ensure that all the Dockerfiles you create follow all the Dockerfile best practices available in an automated way. Hadolint, as the number already suggested, is a linter tool and, because of that, can also help you to teach you all these best practices when creating Dockerfiles yourself. We already talked about it the optimization of container image size, but today we are going to try to cover it more in-depth.

Hadolint is a smaller tool written in Haskell that parses the Dockerfile into an AST and performs rules on top of the AST. It stands on the shoulders of ShellCheck to lint the Bash code inside RUN instructions, as shown in the picture below:

Hadolint Explained: Dockerfile Best Practices Using CLI, Docker, and VS Code

There are several ways to run the tool, depending on what you try to achieve, and we will talk a little bit about the different options.

Running it as a standalone tool

This is the first way we can run it as a complete standalone tool that you can download from here , and it will need to do the following command.

 hadolint <Dockerfile path>

It will run against it and show any issue that is found, as you can see in the picture below:

Hadolint execution

For each of the issues found, it will show the line where the problem is detected, the code of the Dockerfile best practice check that is being performed (DL3020), the severity of the check (error, warn, info, and so on), and the description of the issue.

To see all the rules that are being executed, you can check them in the GitHub Wiki , and all of them are based on the Dockerfile best practices published directly from Docker on its official web page here.

For each of them, you will find a specific wiki page with all the information you need about the issue and why this is something that should be changed, and how it should be changed, as you can see in the picture below:

Hadolint GitHub Wiki page

Ignore Rules Capability

You can ignore some rules if you don’t want them to be applied because there are some false-positive or just because the checks are not aligned with the Dockerfile best practices used in your organization. To do that, you can include an —ignore parameter with the rule to be applied:

 hadolint --ignore DL3003 --ignore DL3006 <Dockerfile>

Running it as Docker Container

Also, the tool is available as a Docker container in the following repos:

docker pull hadolint/hadolint
# OR
docker pull ghcr.io/hadolint/hadolint

And this will help you to be introduced to your Continuous Integration and Continuous Deployment or just to be used in your local environment if you prefer not to install software locally.

 Running it inside VS Code

Like many linters, it is essential to have it close to your development environment; this time is nothing different. We would like to have the Dockerfile best practice relative to the editor while we are typing for two main reasons:

  • As soon as you get the issue, you will fix it faster so the code always will have better quality
  • As soon as you know of the issue, you will not make it again in newer developments.

You will have a Hadolint as part of the Extensions: Marketplace, and you can install it:

Hadolint VS Code Extension


Once you have that done, each time you open a Dockerfile, you will validate against all these Dockerfile best practices, and it will show the issues detected in the Problems view, as you can see in the picture below:

Hadolint: VS Code Extension Execution

And those issues will be re-evaluated as soon as you modify and save the Dockerfile again, so you will always see the live version of the problem detected against the Dockerfile best practices.

From Docker Desktop to Rancher Desktop: Simple Migration Guide for Developers

From Docker Desktop to Rancher Desktop: Simple Migration Guide for Developers

As most of you already know, the 31st of January is the last day to use Docker Desktop without applying the new licensing model that pretty much generates a cost for any company usage. Of course, it is still free to use for open-source and small companies, but it is better to meet the requirements using Docker official documentation.

So because of that situation, I started a journey to find an alternative to Docker Desktop because I used docker-desktop a lot. The primary use I do is to startup server-like things for temporary usage that I don’t like to have installed in my machine to keep it as clean as possible (even though this is not always true, but it is an attempt).

So, on that search, I discovered Rancher Desktop was released not a long time ago and promised to be the most suitable alternative. The goal of this post is not to compare both platforms, but if you like to have more information I leave here a post that can provide it to you:

The idea here is to talk more about the journey of that migration. So I installed the Rancher Desktop 1.0.0 on my Mac and the installation was very, very easy. The main difference with Docker Desktop is that Rancher Desktop is built with Kubernetes in mind and for Docker Desktop, that came as an afterthought. So, by default we will have a Kubernetes environment running in our system, and we can even select the version of that cluster as you can see in the picture below:

But also in Rancher, they noticed the opportunity window they have in front of them, and they were very aggressive in providing an easy migration path from Docker Desktop. And the first thing you will notice is that you can configure Docker Desktop to be compliant with the Docker CLI API as you can see in the picture below.

This is not enabled by default, but it is very easy to do it and it will make you not need to change all your “docker-like” commands (docker build, docker ps.. ) so it will smooth a lot of the transition.

Maybe in the future, you want to move away from everything resembling docker even at the client-side and move to a Containers kind of approach, but for now, what I needed is to simplify the process.

So, after enabling that and restarting my Rancher Desktop, I can type my commands as you can see in the picture below:

So, the only thing I need to do is migrate my images and containers. Because I’m not a pure docker usage, I don’t follow sometimes the thing to have your container stateless and using volumes especially when you are doing a small use for some time and that’s it. So, that means that some of my containers also need to be moved to the new platform to avoid any data loss.

So, my migration journey had different steps:

  • First of all, I will commit the stateful containers that I need to keep on the new system using the command docker commit with the documentation that you can find here:
  • Then, I will export all the images that I have now in TAR files using the command docker save with the documentation that you can find it here:
  • And finally, I will load all those images on the new system using docker load command to have it available there. Again, you can find the documentation of that specific command here

To automate a little bit the process even that I don’t have much images loaded because I try to clean up from time to time using the docker system prune command:

I prefer not to do it manually, so I will use some simple scripts to do the job.

So, to perform the export job I need to run the following command:

docker image ls -q | xargs -I {} docker image save {} -o {}.tar

This script will save to have all my images on different tar files into a specific folder. Now, I just need to run the following command from the same folder I had run the previous one to have all the images back into the new system:

find . -name "*.tar" -exec docker load -i {} \;

The reason why I’m not doing both actions at the same time is that I need to have running Docker Desktop for the first part and Rancher Desktop for the other. So even though I can automate that as well, I think it is not worth it.

And that’s it, now I can remove the Docker Desktop from my laptop, and my life will continue to be the same. I will try to provide more feedback on how it feels, especially regarding resource utilization and similar topics in the near future.

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.

Reduce Docker Disk Usage Locally: Analyze and Clean Up Images, Containers, and Cache

Reduce Docker Disk Usage Locally: Analyze and Clean Up Images, Containers, and Cache

Discover new options that you have at your disposal to do efficient disk usage in your Docker installation

The rise of the container has been a game-changer for all of us, not only at the server-side where pretty much any new workload that we deploy is deployed in a container form but also in our local environments happens same change.

We embrace containers to easily manage the different dependencies that we need to handle as developers. Even if the task at hand was not a container-related thing. Do you need a Database up & running? You use a containerized version of it. Do you need a Messaging System to test some of your applications? You quickly start a container providing that functionality.

And as soon as you don’t need them, those are killed, and your system is still as clean as it was before starting this task. But there are always things that we need to handle even when we have a wonderful solution in front of us, and in the case of a local docker environment, Disk Usage is one of the most critical ones.

This process of launching new things over and over and then we get rid of them is true in some way because all these images that we have needed and all these containers we have launched are still there in our system waiting for a new round and during that time using our disk resources as you can see in a current picture of my local Docker environment with more than 60 GB used for that purpose.

Reduce Docker Disk Usage Locally: Analyze and Clean Up Images, Containers, and Cache
Docker dashboard settings page image showing the amount of disk Docker is using.

The first thing we need to do is to check what is using this amount of space to see if we can release some of them. To do that, we can leverage on the docker system df command the docker CLI provides to us:

Reduce Docker Disk Usage Locally: Analyze and Clean Up Images, Containers, and Cache
The output of the execution of the docker system df command

As you can see, the 61 GB that is in use are 20.88 GB per the images that I have in use, 21.03 MB just for the containers that I have defined, 1.25 GB for the local volumes 21.07 for the build cache. As I only have active 18 of the 26 images defined I can reclaim up to 9.3 GB that is an important amount.

If we would like to get more details about this data, we can always use the verbose option as an append to the command, as you can see in the picture below:

Reduce Docker Disk Usage Locally: Analyze and Clean Up Images, Containers, and Cache
Detailed verbose output of the docker system df -v command

So, after getting all this information, we can go ahead and execute a prune of your system. This activity will get rid of any unused container and image that you have in your system, and to execute that, you only need to type this:

docker system prune -af

It has several options to turn a little bit the execution that you can check on the Docker Oficial web page :

In my case, that help me to recover up to 40.8 GB of my system, as you can see in the picture below.

Reduce Docker Disk Usage Locally: Analyze and Clean Up Images, Containers, and Cache

But if you would like to move one step ahead, you can also tune some properties to consider where you are executing this prune. For example, the defaultKeepStorage will help you define how much disk you want to use for this build-cache system to optimize the amount of network usage you do when building images with common layers.

To do that, you need to have the following snippet in your Docker Engine configuration section, as shown in the image below:

Reduce Docker Disk Usage Locally: Analyze and Clean Up Images, Containers, and Cache
Docker Engine configuration with the defaultKeepStorage up to 20GB

I hope that all this housekeeping process will help your location environments to shine again and get the most of it without needing to waste a lot of resources in the process

Portainer Explained: Evolution, Use Cases, and Why It Still Matters for Kubernetes

Portainer Explained: Evolution, Use Cases, and Why It Still Matters for Kubernetes

Discover the current state of the first graphical interfaces for docker containers and how it provides a solution for modern container platforms

I want to start this article with a story that I am not sure all of you, incredible readers, know. It was a time that there were no graphical interfaces to monitor your containers. It was a long time ago, understanding a long time as we can do in the container world. Maybe this was 2014-2015 when Kubernetes was in its initial stage, and also, Docker Swarm was just released and seemed the most reliable solution.

So most of us didn’t have a container platform as such. We just run our containers from our own laptops or small servers for cutting-edge companies using docker commands directly and without more help than the CLI tool. As you can see, things have changed a lot since then, and if you would like to refresh that view, you can check the article shared below:

And at that time, an open-source project provides the most incredible solution because we didn’t know that we needed that until we use it, and that option was portainer. Portainer provides a very awesome web interface where you can see all the docker containers deployed on your docker host and deploy as another platform.

Portainer: A Visionary Software and an Evolution Journey
Web page of portainer in 2017 from https://ostechnix.com/portainer-an-easiest-way-to-manage-docker/

It was the first one and generated a tremendous impact, even generated a series of other projects that were named: the portainer of… like dodo the portainer of Kubernetes infrastructure at that time.

But maybe you can ask.. and how is portainer doing? is still portainer a thing? It is still alive and kicking, as you can see on their GitHub project page: https://github.com/portainer/portainer, with the last release in the last of May 2021.

Now they have a Business version but still as Comunity Edition one that is the one that I am going to be analyzing here in more detail in another article. Still, I would like to provide some initial highlights:

  • Installing process still follows the same approach as the initial releases to be another component of your cluster. The options to be used in Docker, Docker Swarm, or Kubernetes cover all the main solutions all enterprise uses.
  • Provides now a list of application templates similar to the Openshift Catalog list, and also, you can create your own ones. This is very useful for companies that usually rely on these templates to allow developers to use a common deployment approach without needing to do all the work.
Portainer Explained: Evolution, Use Cases, and Why It Still Matters for Kubernetes
Portainer 2.5.1 Application Template view
  • Team Management capabilities can define users with access to the platform and group those users as part of the team to a more granular permission management.
  • Multi-registry support: By default, it will be integrated with Docker Hub, but you can add your own registries as well and be able to pull images directly from those directly from the GUI.

In summary, this is a great evolution of the portainer tool while keeping the same spirit that all the old users loved at that time: Simplicity and Focus on what an Administrator or Developer needs to know, but also adding more features and capabilities to keep the pace of the evolution in the container platform industry.

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.

Scan Docker Images Locally for Vulnerabilities Using Snyk (DevSecOps Guide)

Scan Docker Images Locally for Vulnerabilities Using Snyk (DevSecOps Guide)

Learn how you can leverage the use of Snyk inside your Docker engine installation

Security is the most relevant topic in modern architecture. It needs to be handled from all different perspectives. Having a single team auditing the platforms and the developments that we built is not enough.

The introduction of DevSecOps as the new normal, including the security teams and policies being part of the development process to avoid security becoming a blocker of innovation and make sure that the artifacts we deploy are secured, have made this clear.

Docker image scanning is one of the most important topics we can cover regarding the container images to know that all the internal components that are part of the image are safe from vulnerabilities. We usually rely on some systems to do so.

I wrote an article regarding the usage of one of the most relevant options (Harbor) from the open source world to do this job.

And this is also being done by different Docker repositories from cloud providers like Amazon ECR as of this year. But why do we need to wait until we push the images to an external Docker registry? Why can’t we do it in our local environment?

Now we can. Version 2.5.0.1 of the Docker engine also includes the Snyk components needed to inspect the Docker images directly from the command line:

https://www.docker.com/blog/combining-snyk-scans-in-docker-desktop-and-docker-hub-to-deploy-secure-containers/


Scanning Your Local Images

So, let’s start. Let’s open a new terminal and type the following command:

docker scan <image-name>

As soon as we type this, the command will tell us that this scanning process will use Snyk to do that and we need to authorize access to those services to do the scanning process.

After that, we get a list of all the vulnerabilities detected, as you can see in the picture below:

Vulnerability scanning
Vulnerability scanning using your local Docker client

For each of the vulnerabilities, you can see the following data:

Vulnerability info
Detailed information provided for each of the vulnerabilities detected

We get the library with the vulnerability, the severity level, and a short description of it. If you need more details, you can also check the provided URL that is linked to a description page for that vulnerability:

Vulnerabilities page
Vulnerability detailed page from snyk

Finally, it also provides the sources introducing this library in your image so this can be solved quickly.

It provides a high-level view of the whole image too, as you can see here:

Overview of Docker images
Overview of your Docker images with all the vulnerabilities detected

So, now you don’t have any excuse to not have all your images safe and secure before pushing to your local repository. Let’s do it!

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.

How to Analyze and Reduce Docker Image Size (Layer Analysis and Best Practices)

How to Analyze and Reduce Docker Image Size (Layer Analysis and Best Practices)

Find how you can improve the size of your Docker images for a better experience and savings inside your organization.

Containerization is the new normal. We are all aware of that. All the new versions of the corporate software and all the open-source projects are including the options to use a docker image to run their software.

Probably you already have been doing your tests or even running in production workloads based on docker images that you have built yourself. If that is the case, you probably know one of the big challenges when you’re doing this kind of task: How to optimize the size of the image you generate?

One of the main reasons the docker image can be so big is because they are built following a layered concept. And that means that each of the images is being created as the addition of layers, each associated with the different commands you have in your Dockerfile.

How to Analyze and Reduce Docker Image Size (Layer Analysis and Best Practices)
Graphical explanation of how a Docker image is compounded.

Use dive to analyze the size of your images

dive is an open-source project that provides a detailed view of the composition of your docker images. It works as a command-line interface application that has a great view of the content of the layers, as you can see in the picture below:

How to Analyze and Reduce Docker Image Size (Layer Analysis and Best Practices)
Dive execution of a BusinessWorks Container Edition image

The tool follows an n-curses interface (if you are old enough to remember how tools were before Graphical User Interfaces was a thing; it should look familiar) and has these main features:

  • This tool will provide the list of layers in the top-left of the screen and the size associated with each of them.
  • Provides general stats about image efficiency (a percentage value), a potential view of the wasted size, and the image’s total size.
  • For each of the layers selected, you get a view on the file system for this view with the data of each folder’s size.
  • Also, get a view of the bigger elements and the number of replication of these objects.

Now, you have a tool that will help you first to know how your image is built and get performance data of each of the tweaks that you do to improve that size. So, let’s start with the tricks.

1.- Clean your image!

This first is quite obvious, but that doesn’t mean that it is not important. Usually, when you create a Docker image, you follow the same pattern:

  • You declare a base image to leverage on.
  • You add resources to do some work.
  • You do some work.

Usually, we forget an additional step: To clean the added resources when they are not needed anymore! So, it is important to be sure that we remove each of the files that we don’t need anymore.

This also applies to other components like the apt cache when we are installing a new package that we need or any temporary folder that we need to perform an installation or some work to build the image.

2.- Be careful about how you create your Dockerfile

As we already mentioned, each of the commands that we declare in our Dockerfile generates a new layer. So, it is important to be very careful with the lines that we have in the Dockerfile. Even if this is a tradeoff regarding the readability of the Dockerfile, it is important to try to merge commands in the same RUN primitive to make sure we are not creating additional layers.

How to Analyze and Reduce Docker Image Size (Layer Analysis and Best Practices)
Sample for a Dockerfile with merged commands

You can also use Docker linters like Hadolint that will help you with this and other anti-patterns that you should avoid when you are creating a Dockerfile.

3.- Go for docker build — squash

The latest versions of the Docker engine provide a new option when you build your images to create with the minimized size squashing of the intermediate layers that can be created as part of the Dockerfile creation process.

That works, providing a new flag when you are doing the build of your image. So, instead of doing this:

docker build -t <your-image-name>:<tag> <Dockerfile location>

You should use an additional flag:

docker build --squash -t <your-image-name>:<tag> <Dockerfile location>

To be able to use this option, you should enable the experimental features on your Docker Engine. To do that, you need to enable that in your daemon.json file and restart the engine. If you are using Docker for Windows or Docker for Mac, you can do it using the user interface as shown below:

How to Analyze and Reduce Docker Image Size (Layer Analysis and Best Practices)

Summary

These tweaks will help you make your Docker images thinner and much more pleasant the process of pulling and pushing and, at the same time, even saving some money regarding the storage of the images in the repository of your choice. And not only for you but for many others that can leverage the work that you are doing. So think about yourself but also think about the community.

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.