on the creation of compute-heavy scientific microservices

My research over the past few years has been targeted towards the realm of scientific microservices. To be concrete, in this post I am using the following definitions:

Many of the most useful scientific tools are only usable from heavyweight and monolithic native applications. Some examples include ParaView, VisIt, and Tableau. Although these tools have improved and now offer a degree of “scriptability” and control, they are still designed and used by single users on their own computers. As heavy as they are, this also means that everyone that wants to use the tools will need a strong computer of their own. In an organization (whether a business or a school), it would be better to buy one really strong computer and allow users to borrow the compute resources of that server.

Web services support this role of resource sharing exceptionally well. In fact, ParaView has adopted this functionality in their tool ParaViewWeb, and although very exciting for embedding visualization in many applications, it still falls short in an important aspect: they still intend for only 1 user per machine. One reason for this is that, although ParaView now communicates over HTTP, it is still monolithic underneath the hood and must be treated as such. Hence, it is not sufficient to have a “service” because it may still be too large.

Microservices have taken off across many companies and organizations. They separate themselves from traditional services in that each microservice is responsible for a very small domain. For example, a service may be responsible for users, payment processing, and the domain logic of an application, but a microservice solution would have at least 3 separate services, one for users, one for payment, and one for domain logic.

Exposing these scientific tools with a web server is nontrivial. They are often written in C/C++ with high performance libraries that require specific environments to function. For example, a tool might use Open MPI and its executables need to be run with mpirun(1) instead of just being exposed as a shared library.

This post is primarily to showcase different methods of operating scientific tools in Python using a web server. For simplicity, the code samples target the Flask web framework and a quadratic integration method. Where possible, we try to support different functions, and in some cases, we can even pass in a Python function that the tool can call directly instead of pre-compiling a set of functions.

The methods showcased range from least to most effort and likewise from least to most performance:

Python server w/ Python

In some simple cases, the entire server can be written in Python, including the scientific parts. Thanks to NumPy and SciPy, there is a wide range of scientific code that can be used directly and without much effort.

# File: oops_all_python.py
# Run with: FLASK_APP=oops_all_python flask run
# Test with: curl http://localhost:5000/integrate/0.0/4.0

from flask import Flask
import scipy.integrate

app = Flask(__name__)

def square(x):
    return x * x

@app.route('/integrate/<float:start>/<float:end>')
def integrate(start, end):
    y, err = scipy.integrate.quad(square, start, end)
    return { 'y': y, 'err': err }

Python server w/ C subprocess

Another option for scientific tools that are hard to extract the logic into a reusable library is to change the tool over to a stdin-stdout interface. In this case, the main function is changed to first read lines on stdin, then do the processing, and finally write to stdout. This trick even works if the tool has complex setup logic, such as a lot of file reading and initialization.

For example, a 3D graphics renderer written in an object oriented manner wouldn't need to be completely modified to put the stdin-stdout loop in the main function, instead its render function that normally runs and returns quickly could be commandeered to enter an infinite stdin-stdout loop and handle web requests that way.

One of the costs of this method is that data needs to be decoded/encoded more often. Parameters can come into the web server in the URL, HTTP headers, and the HTTP POST data. These then need to be encoded to be easily parsed by the C subprocess. After the subprocess is done, the results need to be encoded to be easily parsed in Python. Finally, they need to be encoded into HTTP from Python. All of this adds latency, but is not usually a problem because the scientific tools are usually more expensive than string formatting and parsing.

Another important thing to note is that this method requires some form of threading lock so only one request is interacting with the subprocess at a time. To mitigate the cost of waiting on the lock, multiple subprocesses could be managed at once, or multiple web server and subprocess pairs could be created (though this would require load balancing).

// File: myquadlib.c
// Compile with: gcc -o myquadlib myquadlib.c
// Test with: echo 0 4 | ./myquadlib

typedef float (*func_t)(float);

float square(float x) {
    return x * x;
}

func_t funcs[] = {
    square,
    NULL,
};

void quad(float (*func)(float, int, char *), 

void quad(func_t func, float start, float end, float *y, float *err) {
    float y0 = func(start);
    float y1 = func(end);
    // ... rest of integration code ...
    *y = 1.0;
    *err = 0.1;
}

int main(int argc, char **argv) {
    float start, end, y, err;
    int funcid;
    for (;;) {
        if (3 != scanf("%d %f %f", &funcid, &start, &end)) break;
        quad(funcs[funcid], start, end, &y, &err);
        printf("%f %f\n", y, err);
        fflush(stdout);
    }
}
# File: subprocess_server.py
# Run with: FLASK_APP=subprocess_server flask run
# Test with: curl http://localhost:5000/integrate/0.0/4.0

from threading import Lock
from subprocess import Popen, PIPE
from enum import IntEnum
from flask import Flask
app = Flask(__name__)

process = Popen(['./myquadlib'], stdin=PIPE, stdout=PIPE)
lock = Lock()

class Func(IntEnum):
    SQUARE = 0

def quad(start, end):
    with lock:
        process.stdin.write(b'%d %f %f\n' % (Func.SQUARE, start, end))
        process.stdin.flush()
        line = process.stdout.readline()
        y, err = line.decode('utf-8').split(' ')
        y = float(y)
        err = float(err)
    return y, err

@app.route('/integrate/<float:start>/<float:end>')
def integrate(start, end):
    y, err = quad(start, end)
    return { 'y': y, 'err': err }

Python server w/ ctypes & C library

When the C code can be extracted to a library, another easy solution is to simply use the library directly without any extra work. This is accomplished with the ctypes standard library.

One benefit of ctypes is that it already supports Python callback functions which means that complex APIs can be embedded into Python easily.

There is some extra cost associated with ctypes itself. Due to its generality, any interactions with the C library impose a performance penalty. Internally, the Python code (the server) calls some other Python code (ctypes) which calls some native python code (_ctypes) and this finally sets up stack frames and calls the library code (myquadlib).

// File: myquadlib.c
// Compile with: gcc -shared -o libmyquadlib.so -fPIC myquadlib.c

typedef float (*func_t)(float);

void quad(func_t func, float start, float end, float *y, float *err) {
    float y0 = func(start);
    float y1 = func(end);
    // ... rest of integration code ...
    *y = 1.0;
    *err = 0.1;
}
# File: ctypes_server.py
# Run with: FLASK_APP=ctypes_server flask run
# Test with: curl http://localhost:5000/integrate/0.0/4.0

from flask import Flask
from ctypes import *
app = Flask(__name__)

c_func_t = CFUNCTYPE(c_float, c_float)
myquadlib = CDLL('./libmyquadlib.so')
myquadlib.quad.argtypes = [
    c_func_t, c_float, c_float, POINTER(c_float), POINTER(c_float),
]
myquadlib.quad.restype = None

def square(x):
    return x * x

@app.route('/integrate/<float:start>/<float:end>')
def integrate(start, end):
    y = c_float()
    err = c_float()
    myquadlib.quad(c_func_t(square), start, end, byref(y), byref(err))
    return { 'y': y.value, 'err': err.value }

Python server w/ SWIG & C library

With some extra work, a C library can also be used as a Python library directly, at no higher overhead than any other Python standard library. This does impose an extra compilation step that uses SWIG which is easiest to manage with a setup.py script which can be nontrivial to integrate with build tooling.

The benefit of using SWIG is that internally, the call stack goes from Python (the web server) to SWIG library (myquadlib), which is almost as direct as one can get.

// File: myquadlib.i
// Compile with: a setup.py script that uses swig

%module myquadlib
%include "typemaps.i"

%{
#define SWIG_FILE_WITH_INIT
typedef float (* func_t)(float);

float square(float x) {
    return x * x;
}

func_t funcs[] = {
    square,
    NULL,
};

void quad(int funcid, float start, float end, float *y, float *err) {
    func_t func = funcs[funcid];
    float y0 = func(start);
    float y1 = func(end);
    // ... rest of integration code ...
    *y = 1.0;
    *err = 0.1;
}
%}

void quad(int funcid, float start, float end, float *OUTPUT, float *OUTPUT);
# File: setup.py
# Install with: python3.8 -m pip install --user .

from distutils.core import Extension, setup
from distutils.command.build import build as _build

# Define custom build order, so that the python interface module
# created by SWIG is staged in build_py.
class build(_build):
    # different order: build_ext *before* build_py
    sub_commands = [
        ('build_ext',     _build.has_ext_modules),
        ('build_py',      _build.has_pure_modules),
        ('build_clib',    _build.has_c_libraries),
        ('build_scripts', _build.has_scripts),
    ]

myquadlib = Extension(
    '_myquadlib',
    sources=['myquadlib.i'],
    swig_opts=['-py3'],
)

setup(
    cmdclass={'build': build},
    name='myquadlib',
    py_modules=['myquadlib'],
    ext_modules=[myquadlib],
)
# File: swig_server.py
# Run with: FLASK_APP=swig_server flask run
# Test with: curl http://localhost:5000/integrate/0.0/4.0

from flask import Flask
import myquadlib
from enum import IntEnum
app = Flask(__name__)

class Func(IntEnum):
    SQUARE = 0

@app.route('/integrate/<float:start>/<float:end>')
def integrate(start, end):
    y, err = myquadlib.quad(Func.SQUARE, start, end)
    return { 'y': y, 'err': err }

C Server w/ C library

The last alternative is, if extra performance is desired, we can get rid of Python entirely and write the web server in C. There are many libraries to do this, but a pretty good one is GNU's libmicrohttpd.

One major cost with this method is that all HTTP parsing needs to be done manually in C, using some combination of sscanf(3) or using libmicrohttpd's POST processor APIs. This requires extra care to keep from introducing vulnerabilities. One way to mitigate these is to use a reverse proxy like Nginx which can be configured to discard some possibly dangerous requests before they make it to the C server.

// File: quadserver.c
// Compile with: gcc $(pkg-config libmicrohttpd --cflags) -o quadserver quadserver.c $(pkg-config libmicrohttpd --libs)
// Test with: curl http://localhost:5000/integrate/0.0/4.0

#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/select.h>
#include <sys/socket.h>
#include <microhttpd.h>

#define PORT 5000

typedef float (* func_t)(float);

float square(float x) {
    return x * x;
}

void quad(func_t func, float start, float end, float *y, float *err) {
    float y0 = func(start);
    float y1 = func(end);
    // ... rest of integration code ...
    *y = 1.0;
    *err = 0.1;
}

int handler(void *cls, struct MHD_Connection *connection,
            const char *url,
            const char *method, const char *version,
            const char *upload_data,
            size_t *upload_data_size, void **con_cls) {
    void *content;
    size_t contentlen;
    struct MHD_Response *response;
    int ret, success;
    float start, end, y, err;

    success = 1;
    if (success && strncmp(url, "/integrate/", strlen("/integrate/")) != 0) {
        printf("Bad request (wrong path): '%s'\n", url);
        success = 0;
    }

    if (success && 2 != sscanf(url, "/integrate/%f/%f", &start, &end)) {
        printf("Bad request (wrong format): '%s'\n", url);
        success = 0;
    }

    if (!success) {
        response = MHD_create_response_from_buffer(0, NULL, MHD_RESPMEM_PERSISTENT);
        ret = MHD_queue_response(connection, MHD_HTTP_BAD_REQUEST, response);
        MHD_destroy_response(response);
        return ret;
    }

    quad(square, start, end, &y, &err);

    contentlen = snprintf(NULL, 0, "{\"y\":%f,\"err\":%f}\n", y, err);
    content = malloc(1 + contentlen);
    snprintf(content, 1 + contentlen, "{\"y\":%f,\"err\":%f}\n", y, err);
    
    response = MHD_create_response_from_buffer(contentlen, content, MHD_RESPMEM_MUST_FREE);
    MHD_add_response_header(response, MHD_HTTP_HEADER_CONTENT_TYPE, "application/json");
    ret = MHD_queue_response(connection, MHD_HTTP_OK, response);
    MHD_destroy_response(response);
    return ret;
}

int main(int argc, char **argv) {
    struct MHD_Daemon *daemon;

    daemon = MHD_start_daemon(MHD_USE_INTERNAL_POLLING_THREAD, PORT, NULL, NULL,
                              &handler, NULL, MHD_OPTION_END);
    if (daemon == NULL) return 1;

    printf("Listening on 0.0.0.0:%d\n", PORT);
    printf("Press Enter to close server\n");
    getchar();

    MHD_stop_daemon(daemon);
    return 0;
}