A beginner's guide to Docker multi-stage build process

Docker is one of the most popular containerization platform, which provides ability to package and run an application in a loosely isolated environment called containers. In this chapter, we are going to discuss about multi-stage build process in docker and it's advantages. Keeping size of docker images as small as possible along with reduction of attack surface area has always been the main challenge of building images.

To continue with this chapter, you should have docker installed in your system, as well as have a basic understanding of docker concepts. If you haven't already installed docker in your system, go to the following link:

https://www.nodexplained.com/introduction-to-docker-and-dockerizing-nodejs-application/

To demonstrate the multi-stage build process in docker, let's use a React application. React is a JavaScript library for building user interfaces. We can create a simple react project, using create-react-app tool with the help of npx, a package runner tool. To create a project, run following command:

   
   	npx create-react-app multi-stage-docker-react-app
	cd multi-stage-docker-react-app
   

The above command generates all of the necessary files, folders, and dependent packages, which enable us to run an application instantly. In production, we will be serving build files of a react application to the clients using NGINX web server. If you are new to NGINX, go to the following link:

https://www.nodexplained.com/deploy-web-applications-with-nginx-web-server/

Let's issue following commands to setup the very basic nginx configuration files:

   
   	mkdir nginx-conf && cd nginx-conf
   	touch nginx.conf
   	mkdir conf.d
   	cd conf.d && touch default.conf
   

Now, our project directory structure looks like below:

project directory structure for docker multi-stage build process testing

Contents of nginx.conf is as follows:

   
user nginx;
worker_processes  auto;

error_log /var/log/nginx/error.log warn;

pid        /var/run/nginx.pid;

events {
    worker_connections  1024;
}

http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;
    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                        '$status $body_bytes_sent "$http_referer" '
                        '"$http_user_agent" "$http_x_forwarded_for"';
    access_log  /var/log/nginx/access.log  main;
    sendfile        on;
    keepalive_timeout  65;
    include /etc/nginx/conf.d/*.conf;
}
   


Contents of default.conf, located at conf.d directory is as follows:

   
server {
    listen 80 default_server;
    listen [::]:80 default_server;
    server_name  localhost;
    location / {
        root   /docker-react-app/build;
        index  index.html index.htm;
    }

    location ~ /\. {
        deny all;
    }

    error_page   500 502 503 504  /50x.html;
    location = /50x.html {
        root   /usr/share/nginx/html;
    }
}
   

You can modify root directive folder as per your need.

We can build a docker image, either using single-stage or multi-stage docker build processes.

Single-stage docker build process

The easiest and simplest way to build a docker image is using a single-stage docker build process. Create a file named Dockerfile and paste the following content:

   
FROM alpine:3.16

ARG WORKING_DIR=$HOME/docker-react-app/
ARG PORT=80

RUN apk add --update-cache npm nginx

WORKDIR $WORKING_DIR

COPY  ./package*.json $WORKING_DIR/
RUN npm ci --only=production

COPY  . $WORKING_DIR/

RUN npm run build

COPY ./nginx-conf /etc/nginx/

EXPOSE $PORT

CMD ["nginx", "-g", "daemon off;"]
   

Issue following command to build a docker image from above Dockerfile. react-docker-app is the name of the image. You can give any meaningful name.

   
   	docker build -t react-docker-app .
   

As you can see, size of the resulting image is 554MB, even for a very basic react application. For complex applications, size of an image can increase drastically with this approach.

docker image size using normal build process

Let's see the contents of this docker image using following syntax:

   
   	docker run -it image_name sh
   

Replace image_name with name of the docker image.

   
   	docker run -it react-docker-app sh
   
docker image container contents using normal build process

In the above docker container, we can see a lot of files and folders, many of which are absolutely not required to run an application in a container . To run this application, we only need a build folder. All of these unnecessary files and folders is making the size of an image much bigger and also increasing the surface area of attack in our application. What if some of the npm packages used has malicious code, in that scenario, above docker build approach is unnecessarily putting our application at risk.

So, to mitigate these type of issues, we can use multi-stage docker build process.

Multi-stage docker build process

Multi-stage build syntax was introduced in Docker Engine 17.05. With multi-stage build process, we use multiple FROM statements in a Dockerfile. There should be at least two FROM statements in a Dockerfile, and each can use a different base image, which provides us with a greater flexibility in customizing our build process, as per the requirement. Also, for each additional FROM instruction, a new stage of the build is initiated, which is completely isolated from other build stages. Even though build stages are isolated from each other, it can selectively copy artifacts from one stage to another, leaving behind everything we don't want in the final image.

Let's re-write our Dockerfile using multi-stage build process.

   
FROM alpine:3.16

ARG WORKING_DIR=$HOME/docker-react-app/
ARG PORT=80

RUN apk add --update-cache npm

WORKDIR $WORKING_DIR

COPY  ./package*.json $WORKING_DIR/
RUN npm ci --only=production

COPY  . $WORKING_DIR/
RUN npm run build

FROM alpine:3.16
RUN apk add --update-cache nginx

COPY --from=0 $HOME/docker-react-app/build $HOME/docker-react-app/build/
COPY ./nginx-conf /etc/nginx/

EXPOSE $PORT
CMD ["nginx", "-g", "daemon off;"]
   

As we can see from above Dockerfile content, the entire build process is separated into two stages, as indicated by two FROM instructions. In both stages, we are using alpine:3.16 as the base image. We can also use separate base images in each of the build stages, if needed. First stage contains only the instructions, that is needed to generate build files from a react application. In second stage, we copy over the build artifacts from the first build stage using COPY --from=0. Everything else is discarded from the first build stage.

We need a web server to serve the build files of a react application and for doing that, we have decided to use NGINX, as it is one of the most popular web server in the world. Next, we copy custom NGINX configuration files needed to run the web server and then start it. Since second stage is the last FROM statement, this is the final docker image which ready to be deployed to the production.

By default, docker build stages are not named. When we refer a build stage by number (Zero-index based) as shown above,  it's really confusing, as numbers doesn't relay information about what it really means. Also, if the build stages are reordered, we have to fix the references to those build stages everywhere. So, for readability purpose, we should always give a meaningful name to each docker build stages which will be referenced later. To name a build stage, we can use following syntax:

   
   	FROM BASE_IMAGE:BASE_IMAGE_VERSION as BUILD_STAGE_NAME
   

Let's name our first build stage as builder.

   
   	FROM alpine:3.16 as builder
   

And to reference this build stage from another build stages, we can use following syntax:

   
   	COPY --from=PREVIOUS_BUILD_STAGE_NAME SOURCE_FOLDER_FROM_PREVIOUS_BUILD_STAGE DESTINATION_FOLDER_FROM_CURRENT_BUILD_STAGE
   

Ex:

   
   	COPY --from=builder $HOME/docker-react-app/build $HOME/docker-react-app/build/
   

Our final version of Dockerfile looks like below:

   
FROM alpine:3.16 as builder

ARG WORKING_DIR=$HOME/docker-react-app/
ARG PORT=80

RUN apk add --update-cache nodejs npm

WORKDIR $WORKING_DIR

COPY  ./package*.json $WORKING_DIR/
RUN npm ci --only=production

COPY  . $WORKING_DIR/
RUN npm run build

FROM alpine:3.16
RUN apk add --update-cache nginx

COPY --from=builder $HOME/docker-react-app/build $HOME/docker-react-app/build/
COPY ./nginx-conf /etc/nginx/

EXPOSE $PORT
CMD ["nginx", "-g", "daemon off;"]
   

Even though we have used only two build stages in this Dockerfile, we can add as many build stages as needed.

Issue following command to build the docker image.

   
   	docker build -t react-docker-app .
   
docker multi-stage build logs

As we can see from image below, size of the resulting image is just 10MB. If you compare the differences in size between this image and the one created using single stage build process as discussed above, the difference is whopping 544MB in space optimization/reduction. Even for a simple react application, there is such a huge difference in terms of space. You can only imagine, how much the difference will be for complex applications with many features, dependencies involved.

docker image size using multi-stage build process

Let's run this docker image using following command:

   
   	docker run -d --name react-docker-app -p 80:80 react-docker-app
   

Here, name of the container is react-docker-app and you can name it as you wish. The container references a docker image called react-docker-app.

docker image size using multi-stage build containers

Let's see contents generated in the container for this react application using following command:

   
   	docker exec -it react-docker-app /bin/sh
   

As we can see, there is only a build folder which is just the thing, we need to run an application in a container. All the unnecessary files, folders and dependencies are removed from final docker image. This enhances security of an application as well by reducing the surface area of attacks.

docker image size using multi-stage build process

Now, navigate to http://localhost/ in a browser to view the react app:

docker container running create react app - react application


Let's also implement multi-stage docker build process with our travel application project. Please go to the following link to get Dockerfile for the travel app.

https://www.nodexplained.com/introduction-to-docker-and-dockerizing-nodejs-application/

https://github.com/nodexplained/travel-application

Single-stage docker build process with Node.js


This is our travel application Dockerfile content.

   
FROM alpine:3.16

ARG WORKING_DIR=$HOME/travel-app/
ARG PORT=3000

RUN apk add --update-cache nodejs npm

WORKDIR $WORKING_DIR

COPY  ./package*.json $WORKING_DIR/
RUN npm install --only=production

COPY  . $WORKING_DIR/

EXPOSE $PORT

CMD [ "node", "index.js" ]
   

Let's build the docker image using following command:

   
   	docker build -t travel-app . 
   
node.js docker single stage build process

Here, we can see, size of resulting docker image is 66.7MB. Let's implement the same application with multi-stage docker build process.

Multi-stage docker build process with Node.js

Here, in our Node.js travel application, in the final image, we don't need to install npm and by removing that, we can save some space in our final image.

   
FROM alpine:3.16 as base

ARG WORKING_DIR=$HOME/travel-app/
RUN apk add --update-cache nodejs npm
WORKDIR $WORKING_DIR

COPY  ./package*.json $WORKING_DIR/
RUN npm install --only=production
COPY  . $WORKING_DIR/


FROM alpine:3.16
ARG WORKING_DIR=$HOME/travel-app/
ARG PORT=3000
RUN apk add --update-cache nodejs
WORKDIR $WORKING_DIR
COPY --from=base $HOME/travel-app/ $WORKING_DIR
EXPOSE $PORT
CMD [ "node", "index.js" ]
   

After running docker build command with multi-stage build process, we can see, size of resulting docker image is 58.1MB. Although, the differences in size is not huge, we still saved some space. With complex projects, we can save even more space as well as have other greater flexibilities in terms of build customization.

node.js docker multi stage build process

That's it for this chapter.