Docker multiline logs and Filebeat Autodiscover

Shipping container logs to a centralized log analytics solution is not as simple as it is for apps running in the OS. We’ll focus on Docker, because that’s what we use at SlingNode, but the same challenges and solutions apply to Kubernetes.

We have two requirements:

  1. Ship logs from specific containers
  2. Handle multiline logs

When running in a container we want to output logs to stdout and let the container engine manage them. Docker includes multiple logging mechanisms, it calls them “logging drivers”. Multiple options are available. See Docker’s docs for details: https://docs.docker.com/config/containers/logging/configure/

The default driver is “json-file“. By default, Docker captures the standard output (and standard error) of all your containers, and writes them in files using the JSON format. The JSON format annotates each line with its origin (stdout or stderr) and its timestamp. Each log file contains information about only one container.

On Linux, Docker creates a log file for each container under /var/lib/docker/containers/{{container_id}}/{{container_id}}-json.log. We can find out the path by inspecting the container.

We cannot simply specify the log path since we don’t know the container_id in advance.

Docker treats each line as a separate log entry. It adds its annotations and creates a JSON object from each line. This breaks multiline log entries into separate JSON objects.

The screenshot below shows example of a raw application multiline log entry (abbreviated for legibility)

This is how it looks like in Docker log:

The good folks at Elastic have solved this problem with the Filebeat Autodiscover feature.

Autodiscover allows you to track them [containers] and adapt settings as changes happen. By defining configuration templates, the autodiscover subsystem can monitor services as they start running.

Filebeat uses Docker’s APIs to discover containers and creates harvesters based on the conditions specified in the configuration file. If we don’t specify conditions it will read and ship logs from all containers. We can use the same conditions that processors use. Furthermore, Filebeat gives us access to Docker labels. I have to note here that labels containing dots don’t seem to work even with “labels.dedot” set to true.

Filebeat is aware of the Docker log format and it understands how to apply its parsers. This allows to use Filebeat’s multiline parsers as we normally would.

We annotate our containers with custom labels and then use them to tell Filebeat which logs to ship and when to apply the multiline logic.

    labels:
      slingnode_client: "nethermind"
      slingnode_layer: "execution"

This is how the docker labels look like in Filebeat message:

"container": {
      "labels": {
        "com_docker_compose_oneoff": "False",
        "com_docker_compose_depends_on": "",
        "org_label-schema_url": "https://besu.hyperledger.org/",
        "org_label-schema_vcs-ref": "5161b613",
        "com_docker_compose_project_working_dir": "/opt/observability/",
        "org_label-schema_build-date": "2022-12-09T22:30Z",
        "com_docker_compose_image": "sha256:c0fe05df4b09deac51a24ec59932ed7cc0e6807385a94963d64d5abba185f660",
        "com_docker_compose_project_config_files": /opt/observability/docker-compose.yml",
        "com_docker_compose_config-hash": "f6e73137009c4750c430b2e5ffcd3337f1e7a0b6d180b45e7579cff12aea6cdd",
        "slingnode_client": "besu",
        "com_docker_compose_service": "execution",
        "org_label-schema_vendor": "Hyperledger",
        "org_label-schema_vcs-url": "https://github.com/hyperledger/besu.git",
        "slingnode_layer": "execution",
        "org_label-schema_name": "Besu",
        "com_docker_compose_version": "2.14.1",
        "com_docker_compose_container-number": "1",
        "org_label-schema_description": "Enterprise Ethereum client",
        "org_label-schema_schema-version": "1.0",
        "org_label-schema_version": "22.10.3",
        "com_docker_compose_project": "dockerlogging"
      }
    }
  },

We use the following Filebeat autodiscover config.

filebeat.autodiscover:
  providers:
    - type: docker
      templates:
        # besu and nethermind throw multiline exceptions
        - condition:
            or:
              - equals:
                  docker.container.labels.slingnode_client: besu
              - equals:
                  docker.container.labels.slingnode_client: nethermind
          config:
            - type: container
              paths:
                - /var/lib/docker/containers/${data.docker.container.id}/*.log
              multiline.type: pattern
              multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
              multiline.negate: true
              multiline.match: after
              fields:
                log_type: docker
              fields_under_root: true
        - condition:
            or:
              - equals:
                  docker.container.labels.slingnode_client: geth
              - equals:
                  docker.container.labels.slingnode_client: erigon
          config:
            - type: container
              paths:
                - /var/lib/docker/containers/${data.docker.container.id}/*.log
              fields:
                log_type: docker
              fields_under_root: true
        - condition:
            equals:
              docker.container.labels.slingnode_layer: consensus
          config:
            - type: container
              paths:
                - /var/lib/docker/containers/${data.docker.container.id}/*.log
              fields:
                log_type: docker
              fields_under_root: true

After applying the above config, our multiline log entries are correctly reassembled by Filebeat:

As of writing Filebeat supports: Docker, Kubernetes, Jolokia and Nomad.

Leave a comment