>entire model loaded on the gpu
>cpu at max usage during inference
Something's up with that PR but anyway here's the cockbench for qwen3 next.