Search Results
6/24/2025, 11:22:18 PM
>>105692033
probably final update: performance comparison at different context depths. Only ran with `--repetitions 1`, as it already takes a long time as it is.
unsloth+llama.cpp pp512 uses GPU (1060 6GB), ubergarm+ik_llama.cpp pp512 uses CPU only. Both tg128 are CPU only.
At 8k context you can see a big difference, 3x pp and 2x tg with ik_llama.
Interesting point: running `llama-server` with the same flags as `llama-bench` doesn't throw CUDA error and pp on GPU works just fine...
Anyways, this is the kind of performance that you can expect for 400€ total worth of hardware, not great, but not terrible either considering the cost.
bonus: quick and dirty patch adding `-d, --n-depth` support to ik_llama, to compare results with llama.cpp: https://files.catbox.moe/e64yat.patch
probably final update: performance comparison at different context depths. Only ran with `--repetitions 1`, as it already takes a long time as it is.
unsloth+llama.cpp pp512 uses GPU (1060 6GB), ubergarm+ik_llama.cpp pp512 uses CPU only. Both tg128 are CPU only.
At 8k context you can see a big difference, 3x pp and 2x tg with ik_llama.
Interesting point: running `llama-server` with the same flags as `llama-bench` doesn't throw CUDA error and pp on GPU works just fine...
Anyways, this is the kind of performance that you can expect for 400€ total worth of hardware, not great, but not terrible either considering the cost.
bonus: quick and dirty patch adding `-d, --n-depth` support to ik_llama, to compare results with llama.cpp: https://files.catbox.moe/e64yat.patch
Page 1