on debugging a c++ function with float arguments

August 17, 2022

I ran into a problem when I tried to debug some C++ code that uses VTK. The problem ultimately came down to GDB not understanding that some arguments were being passed via registers instead of on the stack. I worked around this problem using the GDB convenience variable $_caller_is.

Code Example

/**
 *
 */

// stdlib
#include <cstdio>

// VTK
#include <vtkSetGet.h>
#include <vtkObject.h>
#include <vtkNew.h>
#include <vtkObjectFactory.h>


//--- Define a simple vtkObject subclass with the offending method.

struct Foo : public vtkObject {
    static Foo *New();

    vtkSetVector6Macro(Bounds, float);
    float Bounds[6];
};

vtkStandardNewMacro(Foo);


//--- Demonstrate the bug.

int main() {
    vtkNew<Foo> foo;

    float bounds[6];
    bounds[0] = 0.373737f;
    bounds[1] = 1.373737f;
    bounds[2] = 2.373737f;
    bounds[3] = 3.373737f;
    bounds[4] = 4.373737f;
    bounds[5] = 5.373737f;

    std::fprintf(stderr, "main.bounds = { %+0.2f, %+0.2f, %+0.2f, %+0.2f, %+0.2f, %+0.2f };\n",
        bounds[0], bounds[1], bounds[2], bounds[3], bounds[4], bounds[5]);

    foo->SetBounds(bounds);

    std::fprintf(stderr, "foo.Bounds = { %+0.2f, %+0.2f, %+0.2f, %+0.2f, %+0.2f, %+0.2f };\n",
        foo->Bounds[0], foo->Bounds[1], foo->Bounds[2], foo->Bounds[3], foo->Bounds[4], foo->Bounds[5]);

    return 0;
}

My code looked very similar to this. I attached a debugger to it and ran the following gdb batch file.

#!/usr/bin/env -S gdb -x
start
break -function Foo::SetBounds
commands
  printf "---8<---\n"
  info args
  printf "--->8---\n"
  continue
end
continue

What I saw when I ran this was that the first SetBounds call was called with the correct argument, a pointer to a float array with the correct contents. But in the second SetBounds call, I was getting garbage floating point values with extreme exponents (e-41, e21, etc).

To elaborate: in VTK's vtkSetGet.h header, the vtkSetVector6Macro is defined, roughly, as:

#define vtkSetVector6Macro(Name, Type) \
  void Set##Name(const Type *_arg) { \
    this->Set##Name(_arg[0], _arg[1], _arg[2], _arg[3], _arg[4], _arg[5]); \
  } \
  void Set##Name(Type _arg1, Type _arg2, Type _arg3, Type _arg4, Type _arg5) { \
    this->Name[0] = _arg1; \
    this->Name[1] = _arg2; \
    this->Name[2] = _arg3; \
    this->Name[3] = _arg4; \
    this->Name[4] = _arg5; \
    this->Name[5] = _arg6; \
  }

In other words, one function takes a pointer to an array of a type, and the other takes each argument individually. The first function defers to the second to actually update the member variable.

Diagnosing the Problem

I'd already suspected that this could have been a “register vs stack” problem. For this reason, I checked the disassembly of each SetBounds function and saw that it was writing to the %xmm1, %xmm2, etc registers in the first SetBounds function and then reading from %ymm1, %ymm2, etc registers in the second SetBounds function.

Aside: I was a little confused on the switch between %xmm1 and %ymm1. I read online that these are SSE registers that hold multiple floats, chars, etc within a single register. The %xmm1 register holds 4 floats, while the %ymm1 register holds 8 floats, with the first 4 being mirrored in %xmm1.

To verify this, I set a breakpoint in the second SetBounds function, verified that info args showed the incorrect garbage values, and then verified that the %ymm1 register output by info all-registers had the correct values.

Working Around the Problem

Ideally, in GDB, I'd have a way to set a breakpoint on only one of the SetBounds functions. Barring that, I can check if the function that called us is also SetBounds or not (i.e. if we're the second or first, respectively). In a GDB script, that looks like:

#!/usr/bin/env -S gdb -x
start
break -function Foo::SetBounds if ! $_caller_is("Foo::SetBounds")
commands
    printf "Foo::SetBounds((float[6])"
  output *(float *)_arg@6
  printf ")\n"
  continue
end
continue