Container optimization

As you already may know, Docker doesn’t automatically remove images. An image composes of layers that are cached and they are reused if they are not updated between the updates. If you update your images or download new images from Docker Hub Docker creates a new cache for those layers and disk usage is growing. We modify something in our code and then create a new image during our development. It means that our development pc’s disk gets full easily.

Download my source code first from here

This is one of Docker learning series posts. If you want to learn Docker deeply, I highly recommend Learn Docker in a month of lunches.

How to check the disk usage Docker uses

Let’s check the disk usage first. Following is my result.

$ docker system df
TYPE                TOTAL               ACTIVE              SIZE                RECLAIMABLE
Images              45                  4                   3.914GB             3.885GB (99%)
Containers          5                   0                   844B                844B (100%)
Local Volumes       14                  0                   48.49MB             48.49MB (100%)
Build Cache         0                   0                   0B                  0B

It shows that there are 45 images in my pc but it includes obsoleted images because I updated the same image multiple times. When we update our images Docker change the repository name to <none> if the tag name is the same.

$ docker images
REPOSITORY                                               TAG                 IMAGE ID            CREATED             SIZE
poke-app                                                 v2                  457c8be9a52d        8 days ago          960MB
poke-app                                                 v1                  531c51873cb5        8 days ago          960MB
<none>                                                   <none>              91c7e42ff112        9 days ago          960MB
<none>                                                   <none>              875fff8d24a5        9 days ago          960MB
<none>                                                   <none>              ae60a82f9ad0        9 days ago          960MB

I recommend to run docker system prune regularly to remove unnecessary data. If you run the command you will see following result. When you enter y Docker starts deleting data. I didn’t copy the result here.

$ docker system prune
WARNING! This will remove:
  - all stopped containers
  - all networks not used by at least one container
  - all dangling images
  - all dangling build cache

Are you sure you want to continue? [y/N] y

When I see the disk usage again it shows number of images is now 25 and unused containers were removed.

$ docker system df
TYPE                TOTAL               ACTIVE              SIZE                RECLAIMABLE
Images              25                  0                   3.506GB             3.506GB (100%)
Containers          0                   0                   0B                  0B
Local Volumes       14                  0                   48.49MB             48.49MB (100%)
Build Cache         0                   0                   0B                  0B

Optimize Dockerfile

In Dockerfile we use COPY command to copy necessary files into a container. Do you think following Docker files creates exactly the same image? There are three files in src directory, app.js, big-image.jpg and README.md.

# Dockerfile.v1
FROM alpine:latest
COPY ./src /src/

# Dockerfile.v2
FROM alpine:latest
COPY ./src /src/
WORKDIR /src
RUN rm README.md big-image.jpg

# Dockerfile.v3
FROM alpine:latest
COPY ./src/app.js /src/

The result is following. The first two images are the same size but third is not. Even though v2 removes big-image.jpg file the disk size is the same because Docker stores the data in a layer. Since the image composes of those layers the total size doesn’t change even if the file is removed on the last layer.

$ docker images
REPOSITORY                                               TAG                 IMAGE ID            CREATED             SIZE
opt                                                      v2                  5428ba82c105        8 seconds ago       8.63MB
opt                                                      v1                  19177113b2c0        12 seconds ago      8.63MB
opt                                                      v3                  a0297eb93b4a        6 minutes ago       5.57MB

We learnt how to reduce the disk usage from the simple example but we don’t want to specify all necessary files in a COPY command. So let’s place .dockerignore file in the directory where the Dockerfile exists. The format of the file is the same as .gitignore and it excludes specified file.

Choose proper base image

In my blog post I use two nodejs images. One is for development and another is for production. As you can see following result, the disk size is really different. yutona/nodejs-dev image includes lots of tools for development like npm but those tools are actually not necessary in the production container. Choosing right image is not only to reduce the disk size but also security risk. The more unnecessary tools exists in a container, the more vulnerable the container is. We should choose right Docker image to reduce both of them. I don’t write about vulnerability check software in this post but I will try to use it in the future. For example, Anchore, Clair or Aqua… If we can find some vulnerabilities in our base image that we can’t overlook we should skip the update. It means that we should use a base image with fixed version in our Dockerfile in order not to update it automatically. We should decide when to update the image by ourselves.

$ docker images
REPOSITORY                                               TAG                 IMAGE ID            CREATED             SIZE
yutona/nodejs                                            latest              975fff6bc723        About an hour ago   203MB
yutona/nodejs-dev                                        latest              ee0c97e21dc0        3 weeks ago         960MB

Minimize layer size

As mentioned above, Docker generates cache for each command. Even if we download compressed file, unpack it and remove some files in subsequent commands the total disk usage doesn’t change because the subsequent layer just hides the deleted files. However, if the all commands written on a command layer size is reduced.

Multi stage build

Putting multiple commands into one command works but we can also use multi stage build for optimization. The Dockerfile looks like this below. It’s really small example but there are several artifacts in prepare stage in real. The final image created by this Dockerfile is the same as Dockerfile.v3 listed above.

FROM alpine:latest as prepare
COPY ./src /src/

FROM alpine:latest
COPY --from=prepare /src/app.js /src/

Let’s create two images from this Dockerfile and compare the disk size.

# Create an image from prepare stage
$ docker image build -t opt:multi-stage-prepare -f Dockerfile-multi-stage  --target prepare .
# Create the final image
$ docker image build -t opt:multi-stage -f Dockerfile-multi-stage .
$ docker images -f reference=opt
REPOSITORY          TAG                   IMAGE ID            CREATED             SIZE
opt                 v2                    107dcabe64c5        38 hours ago        8.63MB
opt                 multi-stage-prepare   19177113b2c0        38 hours ago        8.63MB
opt                 v1                    19177113b2c0        38 hours ago        8.63MB
opt                 multi-stage           a0297eb93b4a        38 hours ago        5.57MB
opt                 v3                    a0297eb93b4a        38 hours ago        5.57MB

As you can see the result here, the disk size of the prepare stage is the same as v1/v2 but final version is less than those images. Interesting thing is that the image ID is exactly the same. This multi stage build is very good way because it looks simple and easy to maintain. In addition to that, Docker generate cache for each command which means that our build process becomes faster because cache can be used for some of layers.

Conclusion

Image size optimization matters when the container runs with limited resources. If we can reuse cache as much as possible it can reduce development time as well. We frequently build our image during our development and we don’t want to wait for a long time if possible. Whether we can use cache depends on the command order in the Dockerfile so we should consider how we can optimize it.