llama.cpp CUDA dev
!!yhbFjk57TDr
8/3/2025, 2:36:28 PM
No.106125984
>>106125806
This could be an application software issue rather than a CUDA issue.
Choosing which kernel to run for a given operation is extremely finicky and the choice may depend on the CUDA version.
Just recently I found that the kernel selection logic I made for my consumer GPUs at stock are suboptimal for the same GPUs with a frequency limit (up to ~25% end-to-end difference).
So conceivably, since datacenter GPUs tend to have lower frequencies than consumer GPUs, some component in the software stack is choosing to run a kernel that is only available with CUDA 12.8 and faster on datacenter GPUs but slower on consumer GPUs.
This could be an application software issue rather than a CUDA issue.
Choosing which kernel to run for a given operation is extremely finicky and the choice may depend on the CUDA version.
Just recently I found that the kernel selection logic I made for my consumer GPUs at stock are suboptimal for the same GPUs with a frequency limit (up to ~25% end-to-end difference).
So conceivably, since datacenter GPUs tend to have lower frequencies than consumer GPUs, some component in the software stack is choosing to run a kernel that is only available with CUDA 12.8 and faster on datacenter GPUs but slower on consumer GPUs.