>>106367918
VRAM useage didn't cross over to 24GB when I fully offloaded it (at Q4 as well) with 14k context. I'm using GLM 32B right now (same size, Q4 also at 16k context now) and i'm getting around 35 t/s