Search Results
8/9/2025, 2:57:13 AM
>>106196144
tried with that, text gen is now as fast as llama.cpp, but prompt processing is 5x slower
./llama-server --model ~/TND/AI/glmq3kxl/GLM-4.5-Air-UD-Q3_K_XL-00001-of-00002.gguf -ot ffn_up_shexp=CUDA0 -ot exps=CPU -ngl 100 -t 6 -c 16384 --no-mmap -fa -ub 2048 -b 2048 -fmoe -amb 512 -rtr
tried with that, text gen is now as fast as llama.cpp, but prompt processing is 5x slower
./llama-server --model ~/TND/AI/glmq3kxl/GLM-4.5-Air-UD-Q3_K_XL-00001-of-00002.gguf -ot ffn_up_shexp=CUDA0 -ot exps=CPU -ngl 100 -t 6 -c 16384 --no-mmap -fa -ub 2048 -b 2048 -fmoe -amb 512 -rtr
Page 1