.de filesystem [\\$*] .. . .de cli [\\$*] .. . .de tech [\\$*] .. . .title Distributed logging with systemd, journald, and eliot It's hard to debug distributed systems. One possible solution is to use the Python library .tech eliot to trace control flow across the system. The problem is that eliot is set up primarily to write to local files, which is fine for local systems, but not for systems spread across multiple machines. .P This guide covers how I consolidated logs to process on a local machine. .title Terms and scope For the purpose of this document, I need to use a few terms. .P The .tech sink is the machine that logs will be forwarded to. It collects all of the logs that are pushed to it. There should be only one sink. .P A .tech source is one machine that is forwarding logs. There may be multiple sinks. .P There is also a matter of scope of this solution. .P The sink must be publicly accessible (i.e. has an IP address). .P The sink and sources will need to be configured to communicate via HTTP. HTTPS is possible but is not something I bothered with. .scode sink$ uname -a Linux sink 4.4.0-62-generic #83-Ubuntu SMP Wed Jan 18 14:10:15 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux sink$ systemd --version systemd 229 +PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ -LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN sink$ python3.7 --version Python 3.7.4 sink$ python3.7 -m pip freeze | grep eliot eliot==1.11.0 eliot-tree==19.0.0 source$ uname -a Linux source 4.14.154-128.181.amzn2.x86_64 #1 SMP Sat Nov 16 21:49:00 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux source$ systemctl --version systemd 219 +PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 -SECCOMP +BLKID +ELFUTILS +KMOD +IDN .ecode .title How it works On the sink, .tech systemd-journal-remote starts an HTTP server on port 19532. This server listens for POST requests to the .tech /upload endpoint with a Content-Type of .tech application/vnd.fdo.journal \&. When it gets a request, it adds it to a log file located in .filesystem /var/log/journal/remote/ with a filename corresponding to the hostname of the remote host like .filesystem remote-1.2.3.4.journal \&. .P On the source, .tech systemd-journal-upload listens for changes to the journal and for every new log entry, a POST request is made to the sink. Separately, a Python script is run that produces eliot logs which are first written to the journal, and then mirrored to the sink. .P Back on the sink, these log files are read and filtered by .tech journalctl \&. These are then piped to .tech eliot-prettyprint or .tech eliot-tree \&. .title Setup: Sink On the sink, install .tech systemd-journal-remote or .tech systemd-journal-gateway for .tech apt -based or .tech yum -based systems, respectively. We need to configure the file .filesystem /lib/systemd/system/systemd-journal-remote.service and modify the .cli --list-https=-3 flag to .cli --list-http=-3 \&. .scode # This file is part of systemd. # # systemd is free software; you can redistribute it and/or modify it # under the terms of the GNU Lesser General Public License as published by # the Free Software Foundation; either version 2.1 of the License, or # (at your option) any later version. [Unit] Description=Journal Remote Sink Service Documentation=man:systemd-journal-remote(8) man:journal-remote.conf(5) Requires=systemd-journal-remote.socket [Service] ExecStart=/lib/systemd/systemd-journal-remote \\ --listen-http=-3 \\ --output=/var/log/journal/remote/ User=systemd-journal-remote Group=systemd-journal-remote PrivateTmp=yes PrivateDevices=yes PrivateNetwork=yes WatchdogSec=3min [Install] Also=systemd-journal-remote.socket .ecode We then need to enable and start the services. .scode $ sudo systemctl enable systemd-journal-remote $ sudo systemctl start systemd-journal-remote .ecode If you get an exit code of 1 and no log message, there could be two problems. One is that the .cli --listen-https=-3 flag wasn't changed to http, and it's failing when trying to access the HTTPS certificate. Another is that the .filesystem /var/log/journal/remote folder didn't exist, or didn't have the right permissions. The error message you get might look like this. .scode sudo systemctl status systemd-journal-remote.service * systemd-journal-remote.service - Journal Remote Sink Service Loaded: loaded (/lib/systemd/system/systemd-journal-remote.service; indirect; vendor preset: enabled) Active: failed (Result: exit-code) since Tue 2020-01-14 12:33:02 CST; 2s ago Docs: man:systemd-journal-remote(8) man:journal-remote.conf(5) Process: 120042 ExecStart=/lib/systemd/systemd-journal-remote --listen-https=-3 --output=/var/log/journal/remote/ (code=exited, status=1/FAILURE) Main PID: 120042 (code=exited, status=1/FAILURE) Jan 14 12:33:02 sink systemd[1]: Started Journal Remote Sink Service. Jan 14 12:33:02 sink systemd[1]: systemd-journal-remote.service: Main process exited, code=exited, status=1/FAILURE Jan 14 12:33:02 sink systemd[1]: systemd-journal-remote.service: Unit entered failed state. Jan 14 12:33:02 sink systemd[1]: systemd-journal-remote.service: Failed with result 'exit-code'. .ecode If the folder didn't exist, it can be created with these commands. .scode sink$ sudo mkdir /var/log/journal/remote sink$ sudo chown systemd-journal-remote:systemd-journal-remote /var/log/journal/remote .ecode This might not perist across reboots. You could add to the tmpfiles.d configuration file. I haven't tested this yet so I don't know if it's needed. .title Setup: Source On the source, .tech install systemd-journal-remote or .tech systemd-journal-gateway for .tech apt -based or .tech yum -based systems, respectively. .P We need to configure the .filesystem /etc/systemd/journal-upload.conf file to point towards our sink. .scode [Upload] URL=http://sink.example.com # URL= # ServerKeyFile=/etc/ssl/private/journal-upload.pem # ServerCertificateFile=/etc/ssl/certs/journal-upload.pem # TrustedCertificateFile=/etc/ssl/ca/trusted.pem .ecode After this, you need to enable and start the service. .scode source$ sudo systemctl enable systemd-journal-upload source$ sudo systemctl restart systemd-journal-upload .ecode You can then test that the forwarding is working by using .tech systemd-cat like this. .scode source$ echo hello | systemd-cat .ecode Then on the sink, check that a file was created in .filesystem /var/log/journal/remote/ \&. .title Producing eliot logs First we need to install Python and then install eliot with the extra .tech journald dependencies. .scode source$ sudo yum install python37 source$ python3.7 -m pip install --user 'eliot[journald]' .ecode Then we can create a .filesystem test.py file that showcases some eliot logs. .scode """ Write some logs to journald. """ from __future__ import print_function from eliot import log_message, start_action, add_destinations from eliot.journald import JournaldDestination add_destinations(JournaldDestination()) def divide(a, b): with start_action(action_type="divide", a=a, b=b): return a / b print(divide(10, 2)) log_message(message_type="inbetween") print(divide(10, 0)) .ecode Then we run that file on the source like normal. .scode source$ python3.7 test.py .ecode .title Showing eliot logs On the sink, we can see all journal entries from the the remote hosts using .tech journalctl with a flag to specify the root of our journal entries .cli -D /var/log/journal/remote/ \&. .scode sink$ sudo journalctl -D /var/log/journal/remote/ .ecode We can just look at the eliot logs from our script using the filter .cli SYSLOG_IDENTIFIER=test.py \&. We can also render using eliot-tree by specifying an output format .cli --output cat \&. Then we can just pipe into eliot-tree like normal. .scode sink$ sudo journalctl -D /var/log/journal/remote/ --output cat SYSLOG_IDENTIFIER=test.py | eliot-tree --ascii --color never f6cdd52b-babb-4a66-bcab-52a67bedee2b +-- divide/1 > started 2020-01-14 19:45:49Z x 0.001s |-- a: 10 |-- b: 2 +-- divide/2 > succeeded 2020-01-14 19:45:49Z eef20b8b-b1e9-4987-a840-4d53e9f0612f +-- inbetween/1 2020-01-14 19:45:49Z 9092d9a9-11e8-4e63-bd9b-297915d5314a +-- divide/1 > started 2020-01-14 19:45:49Z x 0.000s |-- a: 10 |-- b: 0 +-- divide/2 > failed 2020-01-14 19:45:49Z |-- exception: builtins.ZeroDivisionError +-- reason: division by zero .ecode .title References .ref https://eliot.readthedocs.io/en/stable/outputting/journald.html The main source of documentation for using eliot with journald. Shows how to use .tech JournalDestination \&. .P .ref https://serverfault.com/a/758559 Walks through how to configure and set up .tech systemd-journal-upload and .tech systemd-journal-remote \&. .P .ref https://manpages.debian.org/testing/systemd-journal-remote/systemd-journal-remote.service.8.en.html Man page for .tech systemd-journal-remote(8) \&. .P .ref https://manpages.debian.org/testing/systemd-journal-remote/systemd-journal-upload.service.8.en.html Man page for .tech systemd-journal-upload(8) \&. .P .ref https://serverfault.com/a/573951 Shows how to use .tech systemd-cat(1) \&. .P .ref https://unix.stackexchange.com/a/200107 Shows how to use the .cli -D /path/to/dir flag. .P .ref https://github.com/jonathanj/eliottree The main source of documentation and code for .tech eliot-tree \&.