Experimenting with Qwen/QwQ
The Qwen Team released Qwen/QwQ-32B-Preview about a month ago, and I've been testing it out.
I don't currently have a GPU capable of running the model myself, so I've been using OpenRouter to make API calls.
The model is designed to produce answers in a LaTeX-style \boxed{answer goes here}
format. However, my experience has been that it very rarely uses that format for my questions. As a result, I've been using QwQ to work through a problem, and then a more consistent model (usually Llama 3.2 11B Vision) to summarize and produce a final answer.
My favorite prompt for testing out its performance is to ask it to “Theorize about how much a hay bale weighs” (chat transcripts: first and second). The answers given seem reasonable according to this website.
In order to test this process, I created (with Claude's help) a small web app, though you will need to provide an OpenRouter API key to use it.
I've also been testing this model for compare-and-contrast problems. For instance, I've given it 4 separate sets of things (e.g. movies or genres) and asked it to figure out what is common between those things, and I get a more detailed response from QwQ + Llama than from Llama alone.