Anonymous
10/24/2025, 9:16:30 PM
No.106996947
In case someone out there is curious and really poor and masochistic. I have ddr4 and an old cpu, regular ram is really slow for air. had some vbios and regular bios hiccups but it worked out thanks to some other posts. very finicky gpu.
llama.cpp compiled with both cuda 12.8 and rocm 7.02 on 3090+MI50 32gb ubuntu 24.04 lts
mistral large 123b IQ3XS
prompt eval time = 7807.84 ms / 532 tokens ( 14.68 ms per token, 68.14 tokens per second)
eval time = 10842.38 ms / 54 tokens ( 200.78 ms per token, 4.98 tokens per second)
total time = 18650.22 ms / 586 tokens
glm air 106ba12b IQ3XS
prompt eval time = 1736.62 ms / 460 tokens ( 3.78 ms per token, 264.88 tokens per second)
eval time = 4486.81 ms / 129 tokens ( 34.78 ms per token, 28.75 tokens per second)
total time = 6223.44 ms / 589 tokens
vulkan 3090+MI50 32gb ubuntu
mistral large 123b IQ3XS
prompt eval time = 18885.73 ms / 532 tokens ( 35.50 ms per token, 28.17 tokens per second)
eval time = 20222.64 ms / 132 tokens ( 153.20 ms per token, 6.53 tokens per second)
total time = 39108.37 ms / 664 tokens
glm air 106ba12b IQ3XS
prompt eval time = 3300.40 ms / 460 tokens ( 7.17 ms per token, 139.38 tokens per second)
eval time = 5011.15 ms / 96 tokens ( 52.20 ms per token, 19.16 tokens per second)
total time = 8311.55 ms / 556 tokens
llama.cpp compiled with both cuda 12.8 and rocm 7.02 on 3090+MI50 32gb ubuntu 24.04 lts
mistral large 123b IQ3XS
prompt eval time = 7807.84 ms / 532 tokens ( 14.68 ms per token, 68.14 tokens per second)
eval time = 10842.38 ms / 54 tokens ( 200.78 ms per token, 4.98 tokens per second)
total time = 18650.22 ms / 586 tokens
glm air 106ba12b IQ3XS
prompt eval time = 1736.62 ms / 460 tokens ( 3.78 ms per token, 264.88 tokens per second)
eval time = 4486.81 ms / 129 tokens ( 34.78 ms per token, 28.75 tokens per second)
total time = 6223.44 ms / 589 tokens
vulkan 3090+MI50 32gb ubuntu
mistral large 123b IQ3XS
prompt eval time = 18885.73 ms / 532 tokens ( 35.50 ms per token, 28.17 tokens per second)
eval time = 20222.64 ms / 132 tokens ( 153.20 ms per token, 6.53 tokens per second)
total time = 39108.37 ms / 664 tokens
glm air 106ba12b IQ3XS
prompt eval time = 3300.40 ms / 460 tokens ( 7.17 ms per token, 139.38 tokens per second)
eval time = 5011.15 ms / 96 tokens ( 52.20 ms per token, 19.16 tokens per second)
total time = 8311.55 ms / 556 tokens