Search Results
6/24/2025, 10:58:13 PM
ahh i am doing
./llama.cpp/build/bin/llama-cli \
--rpc "$RPC_SERVERS" \
--model models/unsloth/DeepSeek-R1-0528-GGUF/UD-Q2_K_XL/DeepSeek-R1-0528-UD-Q2_K_XL-00001-of-00006.gguf \
--cache-type-k q4_0 \
--threads -1 \
--n-gpu-layers 99 \
--prio 3 \
--temp 0.6 \
--top_p 0.95 \
--min_p 0.01 \
--ctx-size 16384 \
-ot ".ffn_.*_exps.=CPU" \
-no-cnv \
--prompt "<|User|> blabal <|Assistant|>
and i get 1t/s on 12 A5000, that bad or?
./llama.cpp/build/bin/llama-cli \
--rpc "$RPC_SERVERS" \
--model models/unsloth/DeepSeek-R1-0528-GGUF/UD-Q2_K_XL/DeepSeek-R1-0528-UD-Q2_K_XL-00001-of-00006.gguf \
--cache-type-k q4_0 \
--threads -1 \
--n-gpu-layers 99 \
--prio 3 \
--temp 0.6 \
--top_p 0.95 \
--min_p 0.01 \
--ctx-size 16384 \
-ot ".ffn_.*_exps.=CPU" \
-no-cnv \
--prompt "<|User|> blabal <|Assistant|>
and i get 1t/s on 12 A5000, that bad or?
Page 1