Musings on cloud initialization
________________________________________________________________________________________________________________________

I spent a while reading/learning about how to set up cloud instances.  This document summarizes some of those things.


                                                     The Manual Way
________________________________________________________________________________________________________________________

This is the way I've been doing things up until now.  If I need the machine set up a certain way, then I'll just start
an instance, SSH in, and run the commands I need.  Then, if I need to start up many instances like that, I'll make a
[snapshot] and convert that to an [AMI] .

This way definitely works, but it means that if I made a mistake on the 2nd command, I have to manually re-run every-
thing up until that point.  It also means that explaining what I did is next to impossible unless I take proper notes.

The nice thing is that it is easy to debug and test things when you don't know how they're going to work.  At every step
along the way, I can manually test how it's working.

I also find that it's hard to (de)compose.  The latest example is that I set up an instance with two main features: logs
get forwarded to a single instance, and it runs an application on start up.  If I want to decompose this into each sepa-
rate part, and maybe run a different application with logging, or run two applications on start up but not forward logs,
then I'm at a loss.

Pros: Easy to understand.  Easy to test.

Cons: Hard to explain.  Hard to reproduce.  Hard to (de)compose.


                                         Aside: Running applications on startup
________________________________________________________________________________________________________________________

I like to use [systemd] to do this because I understand it to a certain level.  I also know that it will get things
right with regards to starting up my application again if it crashes.

I want to be able to use systemd user services to run things because it means that I don't have to manage everything as
root.  I ran into problems where I couldn't get it to start actually running my user services on startup, and I'm not
sure why.  There might have been something wrong with the [linger] setting or something to that effect.

In the end, I just used regular services, which meant that I threw a [.service] file in [/etc/systemd/system/] .  A sim-
ple one looks like this.

--------8<--------------------------------------------------------------
[Unit]
AssertPathExists=/usr/local/bin/myapplication

[Service]
ExecStart=/usr/local/bin/myapplication --my-flags
Restart=on-failure
RestartSec=10s

[Install]
WantedBy=multi-user.target
-------->8--------------------------------------------------------------

The [Type=simple] setting is implied by the [ExecStart=] one.  I also find that this is the right level of restart log-
ic.  I had tinkered in the past with some restart settings like [StartLimitIntervalSec=] and [StartLimitBurst=] because
they are referenced nearby in the man page, but they are almost certainly the wrong choice for the types of applications
I write.

After writing this file to [/etc/systemd/system/myapplication.service] , it is a simple matter of enabling and starting
the service to get it to auto-run every time the instance starts.

--------8<--------------------------------------------------------------
$ sudo systemctl enable myapplication
$ sudo systemctl start myapplication
-------->8--------------------------------------------------------------


                                               Automating the manual way
________________________________________________________________________________________________________________________

You can automate the manual way pretty trivially.  Either via [scp && ssh && bash] or using a library like [paramiko] .
Furthermore, you could use [Puppet] or [Chef] or whatever the latest tool like that is and let it handle setting up the
instance.

The main problem with this is that it all happens after-boot, when the entire stack is already up, and requires SSH ac-
cess.  Other than that, there is still the problem of having to write/run everything as one cohesive unit, which makes
things harder to debug.  There is a nice way to facilitate testing by creating one instance along with copying lines
over to a shell script as you run them, which does work, but it makes it hard to relay what testing was done between
steps.

Pros: Automatic.  Simple to understand.

Cons: Tests aren't encoded.  Runs after boot, not on boot.


                                               Automating the Docker way
________________________________________________________________________________________________________________________

If [Docker] is running on the server, then it's pretty easy to push the Docker image to the cloud and then run it from
the server.  You can even set up some automation there to automatically pull the latest image, but it's not the most
straightforward.


                                                Automating the cloud way
________________________________________________________________________________________________________________________

This kind of problem has already been solved by cloud people who need to install different software on different in-
stances.  It's also been fairly standardized because of the availability of many good cloud services.

The [cloud-init] approach is a YAML file that encodes many different devops functions and runs through them.  These
functions include: write this content to this file, run this command in this directory, create this user with this name,
and more.

I see one large flaw in this approach for general cloud instance configuration: it is non-trivial to have things run in
a particular order or interleaved in a particular way.

Consider: to create a systemd service, we need to first [pip install] the application ("run a command"), then we need to
create the systemd service file ("create a file"), and finally enable the service ("run a command").  As far as I can
tell, cloud-init doesn't work like this.  It would want to run both commands one after another and create the file ei-
ther before or after both commands.

In practice, this doesn't have to be a problem because one could create the files using a shell script ("run a com-
mand"), but it does feel at odds with everything else.

There is a very good story about extending cloud-config, however.  By creating custom "part handlers," one can define
exactly what they want to happen when a certain key is found in the YAML file.  This means that one could create some-
thing that would allow the interleaved scenario from above to be efficiently defined.

I also think there would be a lot of value in creating a Docker-to-Cloud-Config converter.  There is a similar Docker-
to-AMI converter, but it functions like "Automating the manual way" from above.

Note: there is some nuance to the distinction between cloud-init and cloud-config.  The cloud-init tool accepts many
different types of input: a shell script, a custom part-handler, a URL to download and run, and a cloud-config file,
etc.  It also allows you to combine multiple inputs together in one "package" which it will run through sequentially.
The cloud-config file supports things like: running scripts, creating users, writing to files, installing packages, etc.
Although cloud-config shares many functions with cloud-init, they serve different use cases.

In the ideal case, I would be able to send multiple cloud-config files to cloud-init and have each file set up a differ-
ent tool, but I believe cloud-init with merge these files together, thus breaking in the way I described above because
it would interleave things together.  I am unsure of this though, and it should be tested, because that would be an easy
solution to my problem.

Pros: Supported by every cloud.  Can be run earlier in the system setup.

Cons: File format is a little hard to understand (compared to a shell script).  Might be composable.  Not applicable to
non-cloud-machines.


                                                       Conclusion
________________________________________________________________________________________________________________________

I haven't found the best way to set up a cloud instance.  Currently, I'm thinking that cloud-init is the best way and
using cloud-config, but that's only if the composition story makes sense.  Otherwise, cloud-init with regular shell
scripts and a framework that facilitates easier testing in between steps is what I'll go with in the future.


                                                       Link Dump
________________________________________________________________________________________________________________________

[0]: https://cloudinit.readthedocs.io/en/latest/topics/format.html
Details how the cloud-init format works and how to combine different steps into one file.

[1]: https://cloudinit.readthedocs.io/en/latest/topics/examples.html
A nice set of examples of the cloud-config format.

[2]: https://serverfault.com/a/413408
I ran into a problem where I needed to set custom environment variables for different instances.  This answer shows one
way to do this, and it's the way I ended up using.

You can create a [/etc/systemd/system/myapplication.service.d] directory and put [.conf] files inside.  Then systemd
will read and merge all of these files together, thus ensuring that your environment variables will be loaded.

Note: You need to still put the right section titles.  Don't forget to put the [Service] header before the [Environ-
ment=] settings, or it won't work.

[3]: https://serverfault.com/a/410438
Inside of the instance, I wanted to know what type it is (i.e.  [t2.micro] or [t2.medium] .  AWS exposes a web server
that has this information at [169.254.169.254] .

Here is a bash snippet to store some of these values as environment variables.

--------8<--------------------------------------------------------------
keys=( $(curl http://169.254.169.254/latest/meta-data/) )
for key in "${keys[@]}"; do
        case "$key" in
        (*/*)
                # Skip any multi-level keys
                continue;;
        esac

        value=$(curl http://169.254.169.254/latest/meta-data/$key)

        # Before: $key looks like "ami-id"
        # After: $key looks like "AMI_ID"
        key=${key^^}
        key=${key//-/_}

        # Safely use eval without any worry about escaping anything
        eval "AWS_$key=\$value"
done
-------->8--------------------------------------------------------------

Afterwards, you can use variables like [$AWS_AMI_ID] or [$AWS_INSTANCE_TYPE] .

[4]: https://forums.aws.amazon.com/thread.jspa?threadID=250683
Apparently AWS instances take a while to disappear after terminating them, so it's fine if they don't disappear from the
console immediately.

[5]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html
AWS has user-data which is sent to cloud-init, so it should be in cloud-init formats (i.e. a shell script, a cloud-con-
fig, or many things concatenated).