Does backend (llamacpp/koboldcpp) and os (windows/linux) affect your speed at all or it's just vram?
I am getting 2-5t/s on generating (30-50t/s on pp) for GLM 4.5 air Q3 K M with 4090 16GB VRAM + 64GB RAM. Gemini says I should be getting 20-50t/s on generation but im getting nowhere close (7t/s with no context), Im running kobold (28 layers offloaded to gpu) on windows because I cant setup ik llama. Genuinely confused on what I should expect and try to aim for here