Distributed logging with systemd, journald, and eliot ________________________________________________________________________________________________________________________ It's hard to debug distributed systems. One possible solution is to use the Python library [eliot] to trace control flow across the system. The problem is that eliot is set up primarily to write to local files, which is fine for local systems, but not for systems spread across multiple machines. This guide covers how I consolidated logs to process on a local machine. Terms and scope ________________________________________________________________________________________________________________________ For the purpose of this document, I need to use a few terms. The [sink] is the machine that logs will be forwarded to. It collects all of the logs that are pushed to it. There should be only one sink. A [source] is one machine that is forwarding logs. There may be multiple sinks. There is also a matter of scope of this solution. The sink must be publicly accessible (i.e. has an IP address). The sink and sources will need to be configured to communicate via HTTP. HTTPS is possible but is not something I both- ered with. --------8<-------------------------------------------------------------- sink$ uname -a Linux sink 4.4.0-62-generic #83-Ubuntu SMP Wed Jan 18 14:10:15 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux sink$ systemd --version systemd 229 +PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ -LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN sink$ python3.7 --version Python 3.7.4 sink$ python3.7 -m pip freeze | grep eliot eliot==1.11.0 eliot-tree==19.0.0 source$ uname -a Linux source 4.14.154-128.181.amzn2.x86_64 #1 SMP Sat Nov 16 21:49:00 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux source$ systemctl --version systemd 219 +PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 -SECCOMP +BLKID +ELFUTILS +KMOD +IDN -------->8-------------------------------------------------------------- How it works ________________________________________________________________________________________________________________________ On the sink, [systemd-journal-remote] starts an HTTP server on port 19532. This server listens for POST requests to the [/upload] endpoint with a Content-Type of [application/vnd.fdo.journal] . When it gets a request, it adds it to a log file located in [/var/log/journal/remote/] with a filename corresponding to the hostname of the remote host like [re- mote-1.2.3.4.journal] . On the source, [systemd-journal-upload] listens for changes to the journal and for every new log entry, a POST request is made to the sink. Separately, a Python script is run that produces eliot logs which are first written to the jour- nal, and then mirrored to the sink. Back on the sink, these log files are read and filtered by [journalctl] . These are then piped to [eliot-prettyprint] or [eliot-tree] . Setup: Sink ________________________________________________________________________________________________________________________ On the sink, install [systemd-journal-remote] or [systemd-journal-gateway] for [apt] -based or [yum] -based systems, re- spectively. We need to configure the file [/lib/systemd/system/systemd-journal-remote.service] and modify the [--list- https=-3] flag to [--list-http=-3] . --------8<-------------------------------------------------------------- # This file is part of systemd. # # systemd is free software; you can redistribute it and/or modify it # under the terms of the GNU Lesser General Public License as published by # the Free Software Foundation; either version 2.1 of the License, or # (at your option) any later version. [Unit] Description=Journal Remote Sink Service Documentation=man:systemd-journal-remote(8) man:journal-remote.conf(5) Requires=systemd-journal-remote.socket [Service] ExecStart=/lib/systemd/systemd-journal-remote \ --listen-http=-3 \ --output=/var/log/journal/remote/ User=systemd-journal-remote Group=systemd-journal-remote PrivateTmp=yes PrivateDevices=yes PrivateNetwork=yes WatchdogSec=3min [Install] Also=systemd-journal-remote.socket -------->8-------------------------------------------------------------- We then need to enable and start the services. --------8<-------------------------------------------------------------- $ sudo systemctl enable systemd-journal-remote $ sudo systemctl start systemd-journal-remote -------->8-------------------------------------------------------------- If you get an exit code of 1 and no log message, there could be two problems. One is that the [--listen-https=-3] flag wasn't changed to http, and it's failing when trying to access the HTTPS certificate. Another is that the [/var/log/journal/remote] folder didn't exist, or didn't have the right permissions. The error message you get might look like this. --------8<-------------------------------------------------------------- sudo systemctl status systemd-journal-remote.service * systemd-journal-remote.service - Journal Remote Sink Service Loaded: loaded (/lib/systemd/system/systemd-journal-remote.service; indirect; vendor preset: enabled) Active: failed (Result: exit-code) since Tue 2020-01-14 12:33:02 CST; 2s ago Docs: man:systemd-journal-remote(8) man:journal-remote.conf(5) Process: 120042 ExecStart=/lib/systemd/systemd-journal-remote --listen-https=-3 --output=/var/log/journal/remote/ (code=exited, status=1/FAILURE) Main PID: 120042 (code=exited, status=1/FAILURE) Jan 14 12:33:02 sink systemd[1]: Started Journal Remote Sink Service. Jan 14 12:33:02 sink systemd[1]: systemd-journal-remote.service: Main process exited, code=exited, status=1/FAILURE Jan 14 12:33:02 sink systemd[1]: systemd-journal-remote.service: Unit entered failed state. Jan 14 12:33:02 sink systemd[1]: systemd-journal-remote.service: Failed with result 'exit-code'. -------->8-------------------------------------------------------------- If the folder didn't exist, it can be created with these commands. --------8<-------------------------------------------------------------- sink$ sudo mkdir /var/log/journal/remote sink$ sudo chown systemd-journal-remote:systemd-journal-remote /var/log/journal/remote -------->8-------------------------------------------------------------- This might not perist across reboots. You could add to the tmpfiles.d configuration file. I haven't tested this yet so I don't know if it's needed. Setup: Source ________________________________________________________________________________________________________________________ On the source, [install systemd-journal-remote] or [systemd-journal-gateway] for [apt] -based or [yum] -based systems, respectively. We need to configure the [/etc/systemd/journal-upload.conf] file to point towards our sink. --------8<-------------------------------------------------------------- [Upload] URL=http://sink.example.com # URL= # ServerKeyFile=/etc/ssl/private/journal-upload.pem # ServerCertificateFile=/etc/ssl/certs/journal-upload.pem # TrustedCertificateFile=/etc/ssl/ca/trusted.pem -------->8-------------------------------------------------------------- After this, you need to enable and start the service. --------8<-------------------------------------------------------------- source$ sudo systemctl enable systemd-journal-upload source$ sudo systemctl restart systemd-journal-upload -------->8-------------------------------------------------------------- You can then test that the forwarding is working by using [systemd-cat] like this. --------8<-------------------------------------------------------------- source$ echo hello | systemd-cat -------->8-------------------------------------------------------------- Then on the sink, check that a file was created in [/var/log/journal/remote/] . Producing eliot logs ________________________________________________________________________________________________________________________ First we need to install Python and then install eliot with the extra [journald] dependencies. --------8<-------------------------------------------------------------- source$ sudo yum install python37 source$ python3.7 -m pip install --user 'eliot[journald]' -------->8-------------------------------------------------------------- Then we can create a [test.py] file that showcases some eliot logs. --------8<-------------------------------------------------------------- """ Write some logs to journald. """ from __future__ import print_function from eliot import log_message, start_action, add_destinations from eliot.journald import JournaldDestination add_destinations(JournaldDestination()) def divide(a, b): with start_action(action_type="divide", a=a, b=b): return a / b print(divide(10, 2)) log_message(message_type="inbetween") print(divide(10, 0)) -------->8-------------------------------------------------------------- Then we run that file on the source like normal. --------8<-------------------------------------------------------------- source$ python3.7 test.py -------->8-------------------------------------------------------------- Showing eliot logs ________________________________________________________________________________________________________________________ On the sink, we can see all journal entries from the the remote hosts using [journalctl] with a flag to specify the root of our journal entries [-D /var/log/journal/remote/] . --------8<-------------------------------------------------------------- sink$ sudo journalctl -D /var/log/journal/remote/ -------->8-------------------------------------------------------------- We can just look at the eliot logs from our script using the filter [SYSLOG_IDENTIFIER=test.py] . We can also render using eliot-tree by specifying an output format [--output cat] . Then we can just pipe into eliot-tree like normal. --------8<-------------------------------------------------------------- sink$ sudo journalctl -D /var/log/journal/remote/ --output cat SYSLOG_IDENTIFIER=test.py | eliot-tree --ascii --color never f6cdd52b-babb-4a66-bcab-52a67bedee2b +-- divide/1 > started 2020-01-14 19:45:49Z x 0.001s |-- a: 10 |-- b: 2 +-- divide/2 > succeeded 2020-01-14 19:45:49Z eef20b8b-b1e9-4987-a840-4d53e9f0612f +-- inbetween/1 2020-01-14 19:45:49Z 9092d9a9-11e8-4e63-bd9b-297915d5314a +-- divide/1 > started 2020-01-14 19:45:49Z x 0.000s |-- a: 10 |-- b: 0 +-- divide/2 > failed 2020-01-14 19:45:49Z |-- exception: builtins.ZeroDivisionError +-- reason: division by zero -------->8-------------------------------------------------------------- References ________________________________________________________________________________________________________________________ [0]: https://eliot.readthedocs.io/en/stable/outputting/journald.html The main source of documentation for using eliot with journald. Shows how to use [JournalDestination] . [1]: https://serverfault.com/a/758559 Walks through how to configure and set up [systemd-journal-upload] and [systemd-journal-remote] . [2]: https://manpages.debian.org/testing/systemd-journal-remote/systemd-journal-remote.service.8.en.html Man page for [systemd-journal-remote(8)] . [3]: https://manpages.debian.org/testing/systemd-journal-remote/systemd-journal-upload.service.8.en.html Man page for [systemd-journal-upload(8)] . [4]: https://serverfault.com/a/573951 Shows how to use [systemd-cat(1)] . [5]: https://unix.stackexchange.com/a/200107 Shows how to use the [-D /path/to/dir] flag. [6]: https://github.com/jonathanj/eliottree The main source of documentation and code for [eliot-tree] .