>>106342486
Oh yeah, fair enough. Here's the generations settings through ST. It's around 100, usually less.

Launch arguments for backend are:
set OMP_NUM_THREADS=28 && set OMP_PROC_BIND=TRUE && set OMP_PLACES=cores && set GGML_CUDA_FORCE_CUBLAS=1 && llama-server.exe --model "F:\text-generation-webui-3.6.1\user_data\models\DeepSeek-R1-UD-IQ1_S\UD-IQ1_S\DeepSeek-R1-0528-UD-IQ1_S-00001-of-00004.gguf" --ctx-size 8192 --port 8080 --n-gpu-layers 999 -ot exps=CPU --flash-attn --threads 28 --batch-size 8192 --ubatch-size 4096 --cache-type-k q4_0 --cache-type-v q4_0 --mlock