Container’s dependency check and health check

Docker
Sponsored links

A software becomes failing state. When our software falls into a failing state we want to restart the software in order to keep the service running. Docker offers HEALTHCHECK functionality for the purpose. There are also some containers that have dependencies to other containers. In this case, the dependent containers must startup before the container starts up. We can also cover this case in a similar way.

HEALTHCHECK function is important when we want to run the container with orchestration system like Docker Swarm or Kubernetes because they can manage to keep those service running without downtime and loss of data. They start new container while one of containers becomes unhealthy.

You can find the complete source code here

The target folders for this post are health-check-server and log-server.

This is one of Docker learning series posts. If you want to learn Docker deeply, I highly recommend Learn Docker in a month of lunches.

  1. Start Docker from scratch
  2. Docker volume
  3. Bind host directory to Docker container for dev-env
  4. Communication with other Docker containers
  5. Run multi Docker containers with compose file
  6. Container’s dependency check and health check
  7. Override Docker compose file to have different environments
  8. Creating a cluster with Docker swarm and handling secrets
  9. Update and rollback without downtime in swarm mode
  10. Container optimization
  11. Visualizing log info with Fluentd, Elasticsearch and Kibana
Sponsored links

Health check server

I created simple http server called health-check-server. It listens to port 80 and when it receives a request http://localhost/hello/boss it becomes failing state. The complete code is following. When the last request is BOSS its status becomes failing. When requesting /status it returns HTTP error 500.

import * as restify from "restify";
import { Logger } from "./Logger";

const server = restify.createServer();
const logger = new Logger("restify-server");
let isLastRequestBoss = false;

function respond(
    req: restify.Request,
    res: restify.Response,
    next: restify.Next
) {
    logger.log(`GET request with param [${req.params.name}]`);
    isLastRequestBoss = false;
    if ((req.params.name as string).toUpperCase() === "BOSS") {
        isLastRequestBoss = true;
    }
    res.send('hello ' + req.params.name);
    next();
}
function healthCheck(
    req: restify.Request,
    res: restify.Response,
    next: restify.Next
) {
    res.send(isLastRequestBoss ? 500 : 200);
    next();
}

server.get('/hello/:name', respond);
server.get('/status', healthCheck);
server.head('/hello/:name', respond);

const port = 80;
server.listen(port, function () {
    logger.log(`${server.name} listening at ${server.url}`);
});
Sponsored links

How to add HEALTHCHECK function

Dockerfile looks like this.

FROM yuto/nodejs
EXPOSE 80
ENV LOGGER_API_URL="http://log-server:80/"

CMD node ./lib/server.js
HEALTHCHECK --interval=1s --timeout=5s --start-period=5s --retries=3 \
            CMD curl --fail http://localhost/status || exit 1

WORKDIR /app
COPY ./node_modules/ /app/node_modules/
COPY ./dist/ /app/

We can specify a command how to check the container status. HEALTHCHECK expects following exit codes. Therefore, || exit 1 is added to the curl command because curl command can return other exit codes.

  • 0: success
  • 1: unhealthy
  • 2: reserved – shouldn’t be used

Several options are specified for the health check. Let’s see the meaning of them. These variables can be defined in docker-compose file too.

  • interval: interval of the health check command execution. First command execution is after specified time elapsed after the container starts up
  • timeout: if health check command takes longer than this time it treats as fail
  • start-period: if health check command returns 1 in this period it doesn’t treat as failing.
  • retries: if the health check command returns 1 specified times in a row the container status becomes unhealthy

Check the Health status

Let’s start the containers and send some requests.

# Run these commands if you haven't created log-server image
cd log-server

# docker image build -t log-server .
npm run dbuild

# docker container run --rm -p 8001:80 --name log-server --network log-test-nat log-server
npm run dstart

# Run these commands in different window
cd health-check-server

# docker image build -t health-check-server:v1 .
npm run dbuild

# docker container run --rm -p 8003:80 --name health-check-server --network log-test-nat health-check-server:v1
npm run dstart

Let’s check the health status first before sending some requests. The health status can be checked by docker inspect <container name>. The result looks like this below. The status is running and the Health.status is healthy.

$ docker inspect health-check-server
...
"State": {
    "Status": "running",
    "Running": true,
    "Paused": false,
    "Restarting": false,
    "OOMKilled": false,
    "Dead": false,
    "Pid": 7257,
    "ExitCode": 0,
    "Error": "",
    "StartedAt": "2020-11-11T19:12:45.3440068Z",
    "FinishedAt": "0001-01-01T00:00:00Z",
    "Health": {
        "Status": "healthy",

Send some requests.

# Check the current status
$ curl --fail http://localhost:8003/status

# Turn the status fail
$ curl  http://localhost:8003/hello/BOSS
"hello BOSS"

$ curl --fail http://localhost:8003/status
curl: (22) The requested URL returned error: 500 Internal Server Error

# Turn the status success again
$ curl http://localhost:8003/hello/hey
"hello hey"
$ curl --fail http://localhost:8003/status

Health check is executed every second it becomes unhealthy state very fast but normally the interval is longer. This time, status is running but Health.Status is unhealthy because http://localhost/status returns error code 500 a lot of times. If this container is handled by Docker Swarm it is replaced with new container.

$ docker inspect health-check-server
...
"State": {
    "Status": "running",
    "Running": true,
    "Paused": false,
    "Restarting": false,
    "OOMKilled": false,
    "Dead": false,
    "Pid": 5120,
    "ExitCode": 0,
    "Error": "",
    "StartedAt": "2020-11-11T18:43:32.6745183Z",
    "FinishedAt": "0001-01-01T00:00:00Z",
    "Health": {
        "Status": "unhealthy",
        "FailingStreak": 166,
        "Log": [
            {
                "Start": "2020-11-11T18:53:06.4267859Z",
                "End": "2020-11-11T18:53:06.5980225Z",
                "ExitCode": 1,
                "Output": "  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n      
                    Dload  Upload   Total   Spent    Left  Speed\n\r  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0\r  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0\ncurl: (22) The requested URL returned error: 500 Internal Server Error\n"
            },

By the way, there is easier way to see the health status.

$ docker ps
CONTAINER ID        IMAGE                    COMMAND                  CREATED             STATUS                   PORTS                  NAMES
ee1e1d782097        health-check-server:v1   "docker-entrypoint.s窶ヲ"   6 seconds ago       Up 4 seconds (healthy)   0.0.0.0:8003->80/tcp   health-check-server

Dependency check before starting a container

Dependency can be defined in docker-compose file but it doesn’t mean that the containers are ready to use. It may takes a few minutes to be ready for some reason. In this case, since the software in the container can’t be used it’s necessary to add logic to wait for the dependent container before going into the main process. In orchestration system, the order of the startup isn’t guaranteed. If it’s possible to add dependency check before starting the container we can keep the actual code clean because it is separated function. If the container fails to start because dependent container is not ready the orchestration system starts new container. It may fail again but the container can start up in the end.

I added the dependency check logic in Dockerfile.v2 and it looks like following.

FROM yuto/nodejs
EXPOSE 80
ENV LOGGER_API_URL="http://log-server:80/"

CMD curl --fail ${LOGGER_API_URL}status && \
    node ./lib/server.js
HEALTHCHECK --interval=1s --timeout=5s --start-period=5s --retries=3 \
            CMD curl --fail http://localhost/status || exit 1

WORKDIR /app
COPY ./node_modules/ /app/node_modules/
COPY ./dist/ /app/

The point is to add additional command before executing desired command. The container sends request by curl --fail ${LOGGER_API_URL}status at startup and if it fails the container stops.

docker stop log-server
docker stop health-check-server
cd health-check-server

# docker image build -t health-check-server:v2 -f Dockerfile.v2 .
npm run dbuild2

# docker container run --rm -p 8003:80 --name health-check-server --network log-test-nat health-check-server:v2
$ npm run dstart2

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0curl: (6) Could not resolve host: log-server
npm ERR! code ELIFECYCLE
npm ERR! errno 6
npm ERR!  health-check-server@1.0.0 dstart2: `docker container run --rm -p 8003:80 --name health-check-server --network log-test-nat health-check-server:v2`
npm ERR! Exit status 6
...

It failed to start because log-server didn’t exist. We need to start the container again to run the service but it is good habit to make the error explicit. If the container is running it may be hard to find the root cause in some cases.

Conclusion

HEALTHCHECK and dependency check may not be necessary unless we need orchestration system but if we have it it’s easy to move. Simple curl command is used in this example but other command is also possible if it returns 0 or 1 which HEALTHCHECK expects as exit code. It may be dll, small script or something else. It’s our choice.

Comments

Copied title and URL