Search Results
6/15/2025, 5:17:50 PM
>>105600177
>>105600181
>>105600228
As you can see, I used a beefy prompt. PP speed is the same. TP speed is way down.
>LLAMA-CLI log
https://pastebin.com/0qpzek00
llama_perf_sampler_print: sampling time = 89.24 ms / 11473 runs ( 0.01 ms per token, 128560.54 tokens per second)
llama_perf_context_print: load time = 23272.08 ms
llama_perf_context_print: prompt eval time = 966150.70 ms / 10847 tokens ( 89.07 ms per token, 11.23 tokens per second)
llama_perf_context_print: eval time = 161015.79 ms / 626 runs ( 257.21 ms per token, 3.89 tokens per second)
llama_perf_context_print: total time = 1208840.17 ms / 11473 tokens
>LLAMA-SERVER log
https://pastebin.com/ztLYiTfV
prompt eval time = 961813.26 ms / 10846 tokens ( 88.68 ms per token, 11.28 tokens per second)
eval time = 347536.86 ms / 811 tokens ( 428.53 ms per token, 2.33 tokens per second)
total time = 1309350.13 ms / 11657 tokens
tp: 3.9t/s vs 2.33t/s ==> 40% decrease in case of LLAMA-SERVER
I'm on Linux. Brave browser
>>105600181
>>105600228
As you can see, I used a beefy prompt. PP speed is the same. TP speed is way down.
>LLAMA-CLI log
https://pastebin.com/0qpzek00
llama_perf_sampler_print: sampling time = 89.24 ms / 11473 runs ( 0.01 ms per token, 128560.54 tokens per second)
llama_perf_context_print: load time = 23272.08 ms
llama_perf_context_print: prompt eval time = 966150.70 ms / 10847 tokens ( 89.07 ms per token, 11.23 tokens per second)
llama_perf_context_print: eval time = 161015.79 ms / 626 runs ( 257.21 ms per token, 3.89 tokens per second)
llama_perf_context_print: total time = 1208840.17 ms / 11473 tokens
>LLAMA-SERVER log
https://pastebin.com/ztLYiTfV
prompt eval time = 961813.26 ms / 10846 tokens ( 88.68 ms per token, 11.28 tokens per second)
eval time = 347536.86 ms / 811 tokens ( 428.53 ms per token, 2.33 tokens per second)
total time = 1309350.13 ms / 11657 tokens
tp: 3.9t/s vs 2.33t/s ==> 40% decrease in case of LLAMA-SERVER
I'm on Linux. Brave browser
Page 1