In case someone out there is curious and really poor and masochistic. I have ddr4 and an old cpu, regular ram is really slow for air. had some vbios and regular bios hiccups but it worked out thanks to some other posts. very finicky gpu.

llama.cpp compiled with both cuda 12.8 and rocm 7.02 on 3090+MI50 32gb ubuntu 24.04 lts

mistral large 123b IQ3XS
prompt eval time = 7807.84 ms / 532 tokens ( 14.68 ms per token, 68.14 tokens per second)
eval time = 10842.38 ms / 54 tokens ( 200.78 ms per token, 4.98 tokens per second)
total time = 18650.22 ms / 586 tokens

glm air 106ba12b IQ3XS
prompt eval time = 1736.62 ms / 460 tokens ( 3.78 ms per token, 264.88 tokens per second)
eval time = 4486.81 ms / 129 tokens ( 34.78 ms per token, 28.75 tokens per second)
total time = 6223.44 ms / 589 tokens



vulkan 3090+MI50 32gb ubuntu

mistral large 123b IQ3XS
prompt eval time = 18885.73 ms / 532 tokens ( 35.50 ms per token, 28.17 tokens per second)
eval time = 20222.64 ms / 132 tokens ( 153.20 ms per token, 6.53 tokens per second)
total time = 39108.37 ms / 664 tokens

glm air 106ba12b IQ3XS
prompt eval time = 3300.40 ms / 460 tokens ( 7.17 ms per token, 139.38 tokens per second)
eval time = 5011.15 ms / 96 tokens ( 52.20 ms per token, 19.16 tokens per second)
total time = 8311.55 ms / 556 tokens