Report Content

Anybody else getting terrible speeds with Qwen3 80b next, on llama.cpp? It easily fits with a GPU/CPU split, and it's smaller than the Air quant I was running prior to this, but it's outputting replies as slow as a dense model would. They're both MoEs, right? Why is Qwen so slow?

I'm using the 16095 PR branch to run Qwen3.

Post Preview