>>106335669
Iirc air would start at around 5-6 t/s at no context and then degrade to a 2-3 slog fest at the 36k context mark, now I'll probably re-download it to test with draft too, meanwhile 4.5 is at a consistent 6-7.5 tokens even w8th the context filled up to 36k
>>106335686
I'm very dumb desu, i started looking at the documentation on the diffrent console flags rn and mostly used ooba before, Iowering batch size from 4k to 1k batch and lower cache quant allowed me bump up n-gpu layer from 21 to 23 thinking more layers on the gpu = faster but yeah no idea wtf i was doing, now i do slightly more so thank you for the tips
>>106335704
Once it finishes re-downloading air I'll try adding it in, wish I knew that before going full retard-scorched earth kek