>Dialing in my performance/args for the big GLM4.5
>6.11 t/s token gen
Huh, I can live with that, just barely
>22.16 t/s prompt processing
KILL ME.
Also after some dicking around, the -ncmoe arg is less efficient than just doing a manual -ot with *exps.=CPU, but not by a whole lot.