Node.js Best Practices for Docker

TL;DR

Use a single Dockerfile and docker-compose.yml where possible for simplicity.
Use bind mounts for development to sync changes without rebuilding images.
Be cautious—host files can override files in Docker (host > docker).
Avoid installing dependencies on the host if using a different OS than the Docker image.
Use WORKDIR instead of RUN mkdir to create and switch directories.
Prefer COPY over ADD unless extracting tar files or copying from URLs.
Run your containers as a non-root user (USER node) for better security.

Introduction

This article provides best practices for Dockerizing Node.js applications for both development and production.

It assumes basic knowledge of Docker, Dockerfiles, and docker-compose. Instead of a step-by-step guide, I'll focus on practical snippets and insights for an optimized setup.

While the concepts here apply primarily to Node.js, many of them can be used for other languages as well. However, every project has different requirements, so consider these best practices as guidelines rather than strict rules.

Using Docker for Development

Before we proceed with the actual best practices, I'd like to talk a bit about using docker for development first

This is because using docker for dev will define some of our constraints when writing our Dockerfile for production

You can actually use multiple Dockerfile and docker-compose.yml files by providing the -f flag

Here's an example where we run docker-compose up using docker-compose.dev.yml as the docker-compose file

docker-compose -f docker-compose.dev.yml up

Here's another example where we build a dockerfile using a custom docker file name like Dockerfile.prod.

This also allows us to specify a Dockerfile in a different path which can useful in certain scenarios like monorepos

docker build -t mplibunao/some-app -f ./server/Dockerfile.prod

Here's an example of a docker-compose file that specifies a custom Dockerfile or custom path to the dockerfile

version: "3.6"
services:
  server:
    build:
      context: ./
      dockerfile: ./server/Dockerfile

For the most part though, the Dockerfile for dev and prod should be similar. In fact it is best to use one Dockerfile for all of your environments instead of one for each (dev, ci, prod)

This is because using multiple files creates too many combinations of flags, commands and files

Here is an example package.json scripts which uses different docker-compose.yml files

"scripts": {
  "docker:prod": "sudo docker-compose -f docker-compose.prod.yml up -d",
  "docker:dev": "docker-compose -f docker-compose.dev.yml up",
  "docker:test": "docker-compose -f docker-compose.test.yml up --abort-on-container-exit",
  "docker:seed:prod": "sudo docker-compose -f docker-compose.prod.yml run rental-pay-api sh -c 'npm run seed'",
  "docker:seed:dev": "docker-compose -f docker-compose.dev.yml run rental-pay-api sh -c 'npm run seed'",
  "docker:test-watch": "docker-compose -f docker-compose.test.yml run rental-pay-api yarn test-watch"
},

This is just an example as you might not even use docker-compose for other environments other than dev. But you get what I mean.

In any case, we will talk in more detail how to address this issue later

Why docker for dev?

I've met a lot of very talented devs who do not like using docker for development which is fine as each approach has its own trade-off

Pros:

Eliminates problems with software versions (think postgres, node) and managing them. Note that you can handle this using version managers like asdf, nvm, rvm
Easy on-boarding for devs. Allows developers to move across projects with ease especially for juniors working remote
- To be fair, for most small or monorepo projects this might not be even a problem
- But for more complex architectures with lots of moving parts and services, it makes me feel better knowing that juniors will be able to start contributing much faster
Allows you to deploy anywhere for prod (heroku, dokku, digital ocean, aws)
If you're using docker for prod, might as well use docker for dev. This has the added benefit of being as close to production as possible

Cons:

For setups which are not fully dockerized (eg: postgres running on docker but the rest not on docker), it's a bit harder to make these dockerized services talk with non-dockerized services
- Yes there are work-arounds like host.docker.internal but doesn't work the same across different OS
- I prefer all services running using docker-compose so everything is run in a single docker network
Docker uses more memory. Might encounter difficulty running apps using multiple containers (you can adjust the memory usage)
Learning curve. You need to learn docker to use docker. You also encounter lots of small issues like permission errors when editing files generated by containers, etc
Knowing docker will not make you good at devops. You still need to know about sysadmin or cloud technologies like file permissions, load balancers, etc.

Using docker-compose for development

For development, I like to use docker-compose since it handles a lot of stuff for me like networking, volumes, env variables, etc though the docker-compose.yml

I will walk through the important things you need to get up and running for dev

Volumes

Persistent volumes are important for using docker for dev as you need to persist the changes you are making to the code

There are 2 mount types: volumes and bind mounts

Volumes - allow you to persist data outside of the container UFS (Union File System)

Similar to bind mounts but stored in a part of the host system managed by docker

In linux machines, you can navigate to this location returned when you run docker inspect <volume-name-or-id>

For windows/mac it uses a bit of magic where it's actually creating a linux VM and so that data is actually inside that linux VM (thus not accessible directly)

Can be anonymous or named volumes

# docker-compose.yml with postgres using named volume
# Not sure what the syntax of anonymous volumes are for docker-compose.yml
# But doesn't really matter since named is better anyway
version: "3.6"
services:
  postgres:
    image: postgres:13.2
    restart: always
    ports:
      - "5432:5432"
    volumes:
      - db_data:/var/lib/postgresql/data
    env_file:
      - .env

volumes:
  db_data:

Bind Mounts - Mounts or sharing of a host directory or file into a container

Links container path to host path (Eg: ~/projects/todo in the host machine links to /app inside the container)

The file or directory does not need to exist on the Docker host already. It is created on demand if it does not exist yet

This can be bad because you are relying on the host machine's filesystem

Eg: Let's say we have the following docker-compose.yml for our todo project

version: "3.6"
services:
  web:
    volumes:
      # Creates bind mount between `~/projects/todo/node_modules` and `/app/node_modules`
      - ./node_modules:/app/node_modules
    ports:
      - "3000:3000"

The volume part of the yaml file links ~/projects/todo/node_modules and /app/node_modules

When you run yarn install during the image build process, it will create a node_modules folder with dependencies inside the container

If you just pulled the git repo you would likely not have a node_modules in your host machine ~/projects/todo without running yarn install

This becomes an issue when you run docker-compose up as the bind mount is created via the docker-compose.yml.

Since ~/projects/todo/node_modules does not exist, it either fails to bind the node_modules on your host machine or it overwrites /app/node_modules inside your container, essentially hiding it.

The same is true if you have an empty node_modules on your host machine; The node_modules inside the container will be empty as well.

Thus you should be careful when using bind mounts as it can hide files in your container if it does not exactly match

Another possible case is if you are using a different os like macos and your container is using linux os like ubuntu or alpine

If you yarn install on your host machine then docker-compose up, you might encounter some issue with your dependencies

This is because some dependencies are installed differently on different operating systems (eg: argon, maybe bcrypt or some filesystem dependency)

So when you docker-compose up, the node_modules from inside your container gets overwritten by your macos node_modules which can cause dependency errors

So if bind modules are so dangerous, why use them at all?

This is because bind mounts allow the container to change the host filesystem.

Eg: When you install a node dependency or generate migration file using docker, the migration file or dependency files are added to your file system as well.

I realized this is not really a big deal since you can generate your migration file from the host and it will not be an issue since host overwrites docker

More importantly this allows you to write code without constantly needing to rebuild your docker image
Bind mounts are also very performant which is perfect for development

version: "3.6"
services:
  web:
    volumes:
      # Creates bind mount between `~/projects/todo` and `/app`
      # Keeps your host and docker filesystem in-sync
      - .:/app
    ports:
      - "3000:3000"

Note: Similar to volumes you can also create named bind mounts

version: "3.6"
services:
  web:
    volumes:
      - web:/app
    ports:
      - "3000:3000"

volumes:
  web:
    # bind named volume to current working directory
    driver_opts:
      type: none
      device: ${PWD}
      o: bind

Env variables for docker-compose

Since we are going to be using docker-compose for development, we will be talking more about passing env variables in that context

Env variables is an important part of developing applications; They allow us to deploy our apps across different environment with ease.

There are a few ways to set env variables

Through the Dockerfile's ENV

# Hard code into your dockerfile
ENV NODE_ENV production

# Pass through ARG
ARG NODE_ENV
ENV NODE_ENV $NODE_ENV

$ docker build --build-arg NODE_ENV=development .

# You can also set a default value for ARG
ARG NODE_ENV=production
ENV NODE_ENV $NODE_ENV

Through environment in docker-compose.yml

version: "2"
services:
  rental-pay-api:
    command: yarn dev -- -L
    environment:
      - NODE_ENV=development
      - PORT=3001
      - JWT_EXPIRATION_MINUTES=1440
      - JWT_SECRET=bA2xcjpf8y5aSUFsNB2qN5yymUBSs6es3qHoFpGkec75RCeBb8cpKauGefw5qy4

Through passing an env file in docker-compose.yml

version: "3.6"
services:
  web:
    volumes:
      - .:/app
    ports:
      - "3000:3000"
    # pass ./.env file
    env_file:
      - .env

Using the env_file is my preferred way since it allows you to manage your env varibles through your env file, which is normally how people do it non-dockerized setups anyway

It also allow you to use different env files for different setups; like if you are using docker-compose for prod as well, then you can pass an .env.production instead

Commands for development

In this section, we will mostly be talking about docker-compose commands over docker commands

The syntax for the 2 are actually very similar, with the only difference being which alias or name you use to refer to a particular container.

docker commands require using either the container id or name to refer to a container. For containers started using docker-compose the naming scheme is usually <name-of-directory>_<service-name>_<number>. You can usually check the name and id of a container by running docker ps or docker container ls

$ docker container ls                                                                                      ? ? ? 874 ? 15:52:15 ?
CONTAINER ID   IMAGE                 COMMAND                  CREATED         STATUS         PORTS                                       NAMES
f097cd0930dc   reddit-clone_web      "docker-entrypoint.sÉ"   2 minutes ago   Up 2 minutes   0.0.0.0:3000->3000/tcp, :::3000->3000/tcp   reddit-clone_web_1
907cd880f15d   reddit-clone_server   "docker-entrypoint.sÉ"   2 minutes ago   Up 2 minutes   0.0.0.0:4000->4000/tcp, :::4000->4000/tcp   reddit-clone_server_1
16e9d7a1cf42   redis:6.2.3           "docker-entrypoint.sÉ"   2 minutes ago   Up 2 minutes   0.0.0.0:6379->6379/tcp, :::6379->6379/tcp   reddit-clone_redis_1
458282ae41d9   postgres:13.2         "docker-entrypoint.sÉ"   2 minutes ago   Up 2 minutes   0.0.0.0:5432->5432/tcp, :::5432->5432/tcp   reddit-clone_postgres_1

$ docker run --rm reddit-clone_web_1 yarn add typescript

In the snippet above, we check the container name of the web service using docker container ls, then run the web container with the command yarn add typescript to install typescript on our web container

docker-compose commands on the other hand allow you to refer to a container using it's service name

version: "3.6"
services:
  # postgres is the service name I'm referring to
  postgres:
    image: postgres:13.2
    restart: always
    ports:
      - "5432:5432"
    volumes:
      - db_data:/var/lib/postgresql/data
    env_file:
      - .env

volumes:
  db_data:

$ docker-compose run postgres psql -U postgres

In the command above, we run the psql command in the postgres container

For running commands inside the container, either approach will generally work, but my personal preference is to use docker-compose run or docker-compose exec over docker run or docker exec. This is because using the service name allows me to run commands without first needing to check the name of the container.

Maybe it's just me but I always end up having to double check the container name/id before I run commands. But then, I think it's possible to pass the container name in the docker-compose.yml which might help with this issue. Also I'm not sure if it's a zsh plugin, but I'm able to auto-complete the container name so that should help if you prefer docker commands

Using docker-compose also runs the containers dependencies which can be positive if you need these service running (like your database as a dependency for your backend when running migrations) or negative if you don't really need these other service running for a particular command (just takes longer to start)

Now I'm going to list some of the important commands I use for development. You may create aliases or use bash scripts or makefiles to help running these commands more convenient

Run

Allows you to run a container

Also allows you to pass commands to overwrite the CMD command in your Dockerfile

docker run --rm reddit-clone_postgres_1 psql -U postgres
docker-compose run --rm postgres psql -U postgres

Flags	Description
`--rm`	Remove the container after closing (always include this)
`--service-ports`	By default, containers are started at a random port to avoid collisions. This tells docker to start it the defined port (useful for long running containers you need to connect to)

Exec

Allows you to run a command against an existing container (already running)

This useful in certain situations like running one-off commands on already running containers without needing to turn it off first

$ docker exec reddit-clone_web_1 yarn add typescript
$ docker-compose exec web yarn add typescript

In the example above we install a new node dependency using the container without needing to close the container first before running a run command with yarn add typescript. This is particlarly useful when you are using a different OS than your docker container as it prevents installing a macos version of your deps which can happen if you install on your host machine.

Another use-case is for opening a bash shell inside your container to inspect or run commands (you can also do this with run)

$ docker exec -it reddit-clone_web_1 bash
$ docker-compose exec -it web bash

Logs

Allows you to view logs of a specific container

# tail flag allows you to only retrieve only a set number of lines
$ docker logs reddit-clone_web_1 --tail 100
$ docker-compose logs web --tail 100

Dockerfile

WORKDIR

Changes working directory similarly to RUN cd /some/path

Instead of using the following to create the directory and make it your working directory

RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app

You can just use WORKDIR directly as it creates the directory if it doesn't exist yet

COPY vs ADD

Dockerfiles gives you two wayo to copy files from the host into a image: COPY and ADD

They are similar but ADD provides additional functionality:

Allows adding files from URLs instead of a file or directory
Allows you to extract tar files

Note: In most cases, if you're using a URL, you download a zip file and then use the RUN command to extract it. However, you might as well just use RUN and curl instead of ADD here, so you chain everything into 1 RUN command to make a smaller Docker image.

Which one should you use?

For copying files from the host to the docker image, it is recommended to use COPY since it is more explicit about your intent

Unless you have specific use-cases for using ADD, it's better to just use COPY to avoid surprises

Non-root User

By default, docker runs containers as root (uid=0) which can be a security issue.

Always try to run containers as a non-root user. The node image provides the user node for exactly this purpose

# Dockerfile
FROM node:14.16.1-alpine

# Put this at the end
USER node

Alternatively you can also create a non-root user using the following commands

# Dockerfile
FROM Centos:7

# Add new user john with user id 8877
RUN useradd -u 8877 john
# Switch to user
USER john

Issues with non-root user

There are a lot of issues I've encountered when trying to run my containers as non-root users. I will try to cover all of them

Permission denied when editing the file

This can usually happen even if you are not using non-root user inside your docker container, but usually depends on the OS as well.

This normally happens when trying to edit a file generated by your docker container on your IDE. This is because by default docker container runs as root user while on your host machine, you are not logged in as a root user. This results in your IDE failing to write to the file because of lack of permissions

$ docker-compose run web sh
$ id
uid=0(root) gid=0(root)

# in your host machine
id
uid=1000(mp) gid=1000(staff)

In the snippet above we run id on both the container and the host machine to see that we are using a different uid and gid

One solution is to run the following command to change the permission of the file on your host machine (change mp to your user)

sudo chown -R mp:mp .

Another is to make the uid inside the container and the host match. By default the first user in linux uses the value of 1000 for both uid and gid. Coincidentally, the USER node we mentioned above uses a value of 1000 for uid and gid so that solves our issue already.

Note: For macos this is not usually an issue even though it uses a different uid and gid because of how docker for mac works*

Permission error starting a container when using a non-root user

This happens because docker builds the image using a root user. However when we start the container we are using a different user which results in permissions error similar to the above issue.

The only difference is we can't run sudo chown node:node . now

The solution is to make the files owned by the non-root user during build time

# COPY files as the node user
COPY --chown=node:node .

# compile typescript to javascript
RUN yarn build

# chown the files during build time
RUN chown -R node:node /app/dist

Normally, we wouldn't need to use the second solution and would only need to use COPY --chown. However, there are cases where we need to RUN chown -R during build time like when compiling the source code.

In the example above, we are compiling the typescript code into javascript inside the /app/dist directory. Now since we are running as root user during build time, the /app/dist becomes owned by root. Therefore we have to change it's permission to our node user