Home
============================

My Docker Setup

:
============================

For my work, I've found that using Docker for almost everything greatly improves the way I work. This means embracing the ideas behind making new images/containers all the time and running every application inside of it. This blog post goes over some of the things I've learned over time that improves Docker enough to be usable for every day projects.

First, it will help to go over some of my goals and constraints:

The systems I generally work with are part of a small-scale, self-managed cluster that my research group shares. We work hard to try to keep all the machines with the same things installed, so that if we run into a problem, we can execute the same fix on all machines.

The projects I work on require lots of dependencies, often ones that are conflicting with other projects. As an example, some projects may work with a pre-packaged library version while others need to build the latest version. Although I could manage this myself using LD_LIBRARY_PATH and other variables, I would prefer to not even have to think about it.

Eventually, most of my projects become microservices and may need to scale up to multiple processes (on a single machine, or on multiple machines). Again, I could manage this myself by: using a shared filesystem (or building on each machine), starting multiple processes (myself or with a supervisor like SystemD), sshing to another machine to start more processes (myself or with something like Puppet). The complexity of building such a system would rival the complexity of using Docker, so I'd rather go with the more common solution there.

Our group tends to hand projects to one another pretty often, so it's nice to not have to worry about dependencies when someone wants to use my code. Following standard packaging things would work here as well (i.e. making heavy use of Python's packaging, or using CMake in a standard way), but we rarely use a single language or tool.

Our projects tend to be around for a long time and are very susceptible to bitrot. We have recently started working on some things that had been last updated several years ago. We can't necessarily fix the older projects using Docker, but we can make it easier on those who need our code in the future.

Disclaimer: I'm well known in my group for switching between "the right way" of software development every so often, but I think I'm reaching a point of convergence. In the past, my projects have been built around the use of Makefiles to ensure that dependencies are automatically handled (for example, we can't start this web server until we've set up the database). I've had Makefile rules that are nigh-incomprehensible after a few weeks. However, Docker has been something I've stuck with since I started using it, and have tinkered with it enough to know it can handle my workloads.


My Standard Project Structure
-----------------------------

Every project starts with a "go.sh" script that includes several subcommands that I've found helpful. In the beginning, I would create this script in a piecemeal fashion, copying from one project to another as I find new patterns. To that end, I'll do the same here by starting with a simple one and then showcasing the fully featured one I now have. For this purpose, we will build a project with the name "foo".

.. code:: bash
:name: go.sh
  #!/usr/bin/env bash
 
  tag=foo:$USER
 
  build() {
      docker build -t $tag .
  }
 
  run() {
      docker run -it --rm $tag "$@"
  }
 
  "$@"

.. code:: Dockerfile
:name: Dockerfile
  FROM ubuntu:bionic
 
  RUN apt-get update && \
      apt-get install -y \
          python3.7 \
          python3-pip \
      && \
      rm -rf /var/lib/apt/lists/*
 
  RUN python3.7 -m pip install \
      requests
 
  WORKDIR /app
  COPY main.py ./
 
  ENTRYPOINT ["python3.7", "-u", "main.py"]
  CMD []

.. code:: python
:name: main.py
  #!/usr/bin/env python3.7
  """
 
  """
 
  import requests
 
 
  def main(url):
      with requests.get(url) as r:
          print(r.content)
 
 
  def cli():
      import argparse
 
      parser = argparse.ArgumentParser()
      parser.add_argument('url')
      args = vars(parser.parse_args())
 
      main(**args)
 
 
  if __name__ == '__main__':
      cli()

This simple file structure handles the common mistakes and challenges when starting to use Docker: using the same tag between building and running, remembering to remove the container once the execution is finished, being able to use stdin when running things in the container, and having distinct images between users.

However, this isn't sufficient for most use of Docker because it also introduces limitations: we lose access to the host file system, so everything has to be in the container (which means rebuilding every time, even for interpreted languages), we run into problems when we want to use our tools in a Linux pipeline (the -t flag expects to be running with a tty), and it isn't easy to debug problems in the container (making sure installed libraries are at the right path is one that's tricky).

Its use is as follows:

.. code:: console

   $ # Build the Dockerfile
   $ ./go.sh build

   $ # Run the main.py script with an argument
   $ ./go.sh run https://google.com


Improving the Structure -----------------------

Some of these problems can be fixed pretty easily, so this iteration will do that. With these changes, we will be able to build once and then reuse the container therafter, even after changing the "main.py" script. If we want to install new dependencies, then we will still have to rebuild though.

.. code:: bash
:name: go.sh
  #!/usr/bin/env bash
 
  tag=foo:$USER
 
  build() {
      docker build -t $tag .
  }
 
  run() {
      docker run \
          -it --rm \
          -v $PWD:$PWD -w $PWD -u $(id -u):$(id -g) \
          $tag "$@"
  }
 
  python() { python3 "$@"; }
  python3() { python3.7 "$@"; }
  python3.7() { run python3.7 "$@"; }
 
  "$@"

.. code:: Dockerfile
:name: Dockerfile
  FROM ubuntu:bionic
 
  RUN apt-get update && \
      apt-get install -y \
          python3.7 \
          python3-pip \
      && \
      rm -rf /var/lib/apt/lists/*
 
  RUN python3.7 -m pip install \
      requests
 
  ENTRYPOINT []
  CMD []

.. code:: python
:name: main.py
  #!/usr/bin/env python3.7
  """
 
  """
 
  import requests
 
 
  def main(url, outfile):
      with requests.get(url) as r:
          content = r.content
 
      print(content, file=outfile)
 
 
  def cli():
      import argparse
 
      parser = argparse.ArgumentParser()
      parser.add_argument('url')
      parser.add_argument('--outfile', type=argparse.FileType('w'))
      args = vars(parser.parse_args())
 
      main(**args)
 
 
  if __name__ == '__main__':
      cli()

With these changes, we can now run things inside the Docker container, but as our regular user, which gives us the ability to write to files inside the container and not have their permissions messed up outside the container. We can also run whatever commands we want inside the container (for example, if we install ffmpeg in the container, we can run that directly rather than just our "main.py" script).

This is what it looks like to use the new script:

.. code:: console

  $ # Build the Dockerfile
  $ ./go.sh build

  $ # Run our main.py script
  $ ./go.sh python -u main.py --outfile google.txt https://google.com

As it is now, the "go.sh" script is sufficient for most use of Docker (the way that I use it), however there are still some drawbacks, mostly owing to how it's fairly feature deficient. This can be considered a feature itself, however in my case, it means that I end up copying features into it when I need them and having subtly different features everywhere. For example, I usually want to create a Docker Service out of my images and I often have a debugging utility called "inspect" to hop into a container to debug problems.

The intent with this script is to edit it whenever you want to add something, like publishing a TCP port from the container. As it's set up now, you would do this by adding the argument "-v 8888:8888" to the "docker run" line. However, I do this often enough that it would be nice if it were a little more automatic.

Making it General
-----------------

This script is -p-o-s-s-i-b-l-y- definitely overengineered, but it has served the basis for many of my projects already.

By (ab)using a Bash feature, I'm able to add lots of conditional flags. This feature is the "alt value parameter substitution" and it looks like:

.. code:: console

  $ myset=1
  $ myunset=
  $ echo myset is: X${myset:+set}X
  myset is: XsetX
  $ echo myunset is: X${myunset:+set}X
  myunset is: XX
  $ reverse=1
  $ echo sort ${reverse:+-r} file
  sort -r file

This allows us to turn a flag variable that's either "1" or "" into a conditional flag for a command we want to run.

Due to the length of the files, I won't include the inline, merely providing links to the code.

go.sh
Dockerfile

This setup uses most of the Docker concepts I know and have used effectively. In particular, it supports:

- Building images: for runtime use, or for distribution/standalone use.

- Running containers: interactively, as part of a pipeline, as the local user, or with the current working directory mounted.

- Service management: pushing an image to a registry, creating a service, destroying a service, scaling a service, checking logs of a service.

The Dockerfile now uses multi-stage builds which allows you to include the instructions for two different images in the same Dockerfile, and reuse pieces from one image to another. In our case, it makes it easier to go from an interactive, development usage of Docker to a standalone, production usage. This would be done by resetting the "target" variable, which will build every stage in the Dockerfile and tag the last one built.

Here's how the image might be used:

.. code:: console

  $ # Build the development image
  $ ./go.sh build

  $ # Run a Python script interactively
  $ ./go.sh python interactive.py

  $ # Run a Python script in a pipeline
  $ echo hello world | ./go.sh script python3.7 uppercase.py

  $ # Get an interactive shell in the container
  $ ./go.sh inspect
  docker$ echo hello | python3.7 uppercase.py > foo
  docker$ cat foo
  HELLO
  docker$ exit
  $ cat foo
  cat: foo: No such file or directory

  $ # Build a production image
  $ vi go.sh  # change target=base to target=
  $ ./go.sh build
  $ ./go.sh run python3.7 main.py  # Uses the one you built with

  $ # Create a service
  $ ./go.sh build
  $ ./go.sh push  # Make image available to the swarm
  $ ./go.sh create  # Create the service
  $ ./go.sh logs -f  # Watch the logs come in
  $ ./go.sh scale 10  # Scale up to 10 instances
  $ ./go.sh destroy  # Destroy the service


Conclusions
-----------

So far, I'm pretty happy with this script and how useful it's been for me. It's taken a messy workflow and standardized it in a way that is easy to share and collaborate with others. Though in the end, I've added a lot of complexity, the core, hackable aspects of the script are still there and aid in fast prototyping.