Search Results
6/25/2025, 12:11:18 PM
>>105698422
./ik_llama.cpp/build/bin/llama-cli \
--rpc "$RPC_SERVERS" \
--model models/ubergarm/DeepSeek-R1-0528-GGUF/IQ2_K_R4/DeepSeek-R1-0528-IQ2_K_R4-00001-of-00005.gguf \
--threads 48 \
--n-gpu-layers 99 \
--temp 0.6 \
--top_p 0.95 \
--min_p 0.01 \
--ctx-size 16384 \
--flash-attn \
-mla 3 -fa \
-amb 512 \
-fmoe \
-ctk q8_0 \
-ot "blk\.(1|2|3|4|5|6)\.ffn_.*=CUDA0" \
-ot "blk\.(7|8|9|10)\.ffn_.*=CUDA1" \
-ot "blk\.(11|12|13|14)\.ffn_.*=CUDA2" \
-ot "blk\.(15|16|17|18)\.ffn_.*=CUDA3" \
-ot "blk\.(19|20|21|22)\.ffn_.*=RPC[10.0.0.28:50052]" \
-ot "blk\.(23|24|25|26)\.ffn_.*=RPC[10.0.0.28:50053]" \
-ot "blk\.(27|28|29|30)\.ffn_.*=RPC[10.0.0.28:50054]" \
-ot "blk\.(31|32|33|34)\.ffn_.*=RPC[10.0.0.28:50055]" \
-ot "blk\.(35|36|37|38)\.ffn_.*=RPC[10.0.0.40:50052]" \
-ot "blk\.(39|40|41|42)\.ffn_.*=RPC[10.0.0.40:50053]" \
-ot "blk\.(43|44|45|46)\.ffn_.*=RPC[10.0.0.40:50054]" \
-ot "blk\.(47|48|49|50)\.ffn_.*=RPC[10.0.0.40:50055]" \
--override-tensor exps=CPU \
--prompt
i am getting 5.5T/s. 2 T/s worse then llama.cpp. also the ubergarm/DeepSeek-R1-0528-GGUF/IQ2_K_R4 are japping holy hell. couldnt even 0-shot a working flappy bird clone- meanwhile /unsloth/DeepSeek-R1-0528-GGUF/UD-Q2_K_XL has no problems.
>>105698577
i am using the cli with a prompt to get some testing done because i dont have a client ready yet.
./ik_llama.cpp/build/bin/llama-cli \
--rpc "$RPC_SERVERS" \
--model models/ubergarm/DeepSeek-R1-0528-GGUF/IQ2_K_R4/DeepSeek-R1-0528-IQ2_K_R4-00001-of-00005.gguf \
--threads 48 \
--n-gpu-layers 99 \
--temp 0.6 \
--top_p 0.95 \
--min_p 0.01 \
--ctx-size 16384 \
--flash-attn \
-mla 3 -fa \
-amb 512 \
-fmoe \
-ctk q8_0 \
-ot "blk\.(1|2|3|4|5|6)\.ffn_.*=CUDA0" \
-ot "blk\.(7|8|9|10)\.ffn_.*=CUDA1" \
-ot "blk\.(11|12|13|14)\.ffn_.*=CUDA2" \
-ot "blk\.(15|16|17|18)\.ffn_.*=CUDA3" \
-ot "blk\.(19|20|21|22)\.ffn_.*=RPC[10.0.0.28:50052]" \
-ot "blk\.(23|24|25|26)\.ffn_.*=RPC[10.0.0.28:50053]" \
-ot "blk\.(27|28|29|30)\.ffn_.*=RPC[10.0.0.28:50054]" \
-ot "blk\.(31|32|33|34)\.ffn_.*=RPC[10.0.0.28:50055]" \
-ot "blk\.(35|36|37|38)\.ffn_.*=RPC[10.0.0.40:50052]" \
-ot "blk\.(39|40|41|42)\.ffn_.*=RPC[10.0.0.40:50053]" \
-ot "blk\.(43|44|45|46)\.ffn_.*=RPC[10.0.0.40:50054]" \
-ot "blk\.(47|48|49|50)\.ffn_.*=RPC[10.0.0.40:50055]" \
--override-tensor exps=CPU \
--prompt
i am getting 5.5T/s. 2 T/s worse then llama.cpp. also the ubergarm/DeepSeek-R1-0528-GGUF/IQ2_K_R4 are japping holy hell. couldnt even 0-shot a working flappy bird clone- meanwhile /unsloth/DeepSeek-R1-0528-GGUF/UD-Q2_K_XL has no problems.
>>105698577
i am using the cli with a prompt to get some testing done because i dont have a client ready yet.
Page 1