>>106869568
tg 7.1t/s at 16k context seems okay, prompt processing seems low, maybe because low context?
i get 5.6t/s at 32k (q8_0) tg and see picrel for prompt processing
(to be fair it's an older commit, here's newer prompt processing result:
INFO [ print_timings] prompt eval time = 161920.52 ms / 30489 tokens ( 5.31 ms per token, 188.30 tokens per second)
this is with -ub 1024 or 2048 likely i forgot
picrel is with -ub 4096 -b 4096
my setup: rtx 3060 12gb, 64gb ddr4, i5 12400f
quant: IQ4_KSS
ik_llama.cpp