Scientific code should be compiled and tuned to best match their processor architecture. Some processors have special instructions for maximum performance; for instance, Intel has SSE instructions, and AMD has their specific instructions too. Code compiled for one architecture doesn't necessarily run on another which is the root of the problem.

The primary symptom you would see is a segfault error of an illegal instruction.

29 Illegal instruction     (core dumped)

This is especially prevalent in building and using Linux Containers for HPC systems. On these systems, although you can use container images that have already been built, creating new container images is more challenging because you have to build them separately and then load them onto the HPC system.

For example, I build some of our container images on my server and then use them on one of several supercomputers. For most systems, this is fine because they both happened to be Intel Broadwell chips, but now one of the machines is AMD k10 based. In cross-compiling for the other machine, I ran into a few hiccups, which is what this post is about.

Case Study: Spack

Spack is a tool to help build and manage dependencies for HPC systems. It makes things relatively easy because it allows you to specify the target architecture directly.

First, you can use spack arch to determine what it believes your architecture is on the target machine. In my case, this yielded a value linux-scientific7-k10, according to the triplet platform-os-target (Architecture Specifiers). The most important value is the target.

Next, you can change your spack install command to specify that target. For instance:

$ spack install mpich target=k10

In the case that you're using a spack.yaml configuration file, you can add the packages subtree and specify your target under packages > all > target (Concretization Preferences).

  - mpich
      target: [k10]

One last thing you can do is find the -march and -mtune parameters to use for other cross-compiling endeavors. The easiest way to do this is to check the source code (microarchitectures.json) for your target. For k10, this entry looks like:

    // ...
    "k10": {
      "from": "x86_64",
      "vendor": "AuthenticAMD",
      "features": [
        // ...
      "compilers": {
        "gcc": {
          "name": "amdfam10",
          "versions": "4.3:",
          "flags": "-march={name} -mtune={name}"
        // ...
    // ...

Based on the compilers > gcc subtree, we can see that -march=amdfam10 and -mtune=amdfam10. As well, if these weren't specified, we could also check x86_64 for its fields.

Case Study: Python

Python makes it relatively easy to change the compiler target, at least using pip. For this, you need to set the CFLAGS environment variable before installing dependencies (StackOverflow answer that references this).

In my case, I already had the -march and -mtune flags I needed, so I just had to run:

$ CFLAGS='-march=amdfam10 -mtune=amdfam10' \
>   python3 -m pip install networkx

Case Study: Rivet

Rivet is a suite of libraries for particle physics simulation and analysis. For end-users, it is compiled using a bootstrap script, which internally calls into the Rivet libraries' autotools configure script.

To tell autotools what architecture to build for, there are some options like --build, --host, and --target which are all a little confusing. They each require a triple of cpu-company-system (System Type – Autoconf). In general, you can ignore --build and --target for most regular libraries, only using them when compiling compilers (Specifying Target Triplets). To make things easier, you can use shorter names for your architecture and a script will normalize them (gcc/config.sub).

In my case, I wasn't able to find out exactly what name should be used for my target architecture, while I did already know what -march and -mtune names I wanted. So, instead, I found that you can just pass CFLAGS directly to the configure script and it will do the right thing (GitHub issue that references this feature).

$ ./configure CFLAGS='-march=amdfam10 -mtune=amdfam10'
$ make
$ make install

Also, the Rivet bootstrap script doesn't expose a way to give the configure scripts any extra arguments. Instead, it uses a function that just passes an install prefix along. The replacement is as follows:

# before
function conf { ./configure --prefix=$INSTALL_PREFIX "$@"; }

# after
function conf {
  ./configure --march=amdfam10 --mtune=amdfam10 --prefix=$INSTALL_PREFIX "$@"

There's a common sentiment of using *nix as an IDE, in contrast to using a more traditional IDE like Visual Studio, VSCode, JetBrains, etc. The argument often comes down to whether you want a single tool to act as the IDE, or if you want the IDE to be made up of smaller pieces that work together.

I fall pretty heavily in the “*nix as an IDE” camp with the exception that I want my shells to be well integrated into my editors, making them act as one. In this post, I talk some about the utility of having this kind of integration and also about how to achieve this in Vim, from enabling the feature to actual uses.


My research over the past few years has been targeted towards the realm of scientific microservices. To be concrete, in this post I am using the following definitions:

Many of the most useful scientific tools are only usable from heavyweight and monolithic native applications. Some examples include ParaView, VisIt, and Tableau. Although these tools have improved and now offer a degree of “scriptability” and control, they are still designed and used by single users on their own computers. As heavy as they are, this also means that everyone that wants to use the tools will need a strong computer of their own. In an organization (whether a business or a school), it would be better to buy one really strong computer and allow users to borrow the compute resources of that server.

Web services support this role of resource sharing exceptionally well. In fact, ParaView has adopted this functionality in their tool ParaViewWeb, and although very exciting for embedding visualization in many applications, it still falls short in an important aspect: they still intend for only 1 user per machine. One reason for this is that, although ParaView now communicates over HTTP, it is still monolithic underneath the hood and must be treated as such. Hence, it is not sufficient to have a “service” because it may still be too large.

Microservices have taken off across many companies and organizations. They separate themselves from traditional services in that each microservice is responsible for a very small domain. For example, a service may be responsible for users, payment processing, and the domain logic of an application, but a microservice solution would have at least 3 separate services, one for users, one for payment, and one for domain logic.

Exposing these scientific tools with a web server is nontrivial. They are often written in C/C++ with high performance libraries that require specific environments to function. For example, a tool might use Open MPI and its executables need to be run with mpirun(1) instead of just being exposed as a shared library.

This post is primarily to showcase different methods of operating scientific tools in Python using a web server. For simplicity, the code samples target the Flask web framework and a quadratic integration method. Where possible, we try to support different functions, and in some cases, we can even pass in a Python function that the tool can call directly instead of pre-compiling a set of functions.

The methods showcased range from least to most effort and likewise from least to most performance:


I had written up a few posts before, but it wasn't working for me. Now with this WriteFreely blog, I'm hoping to write more. Regardless, the old posts are still available:

old posts