Search Results

Found 1 results for "aa76ea803879bb56e149a0beea632c8e" across all boards searching md5.

Anonymous /g/106123208#106125777
8/3/2025, 1:53:52 PM
======PSA NVIDIA ACTUALLY FUCKED UP CUDA======
cuda 12.8 570.86.10:
got prompt
Loading model and applying LoRA weights:: 100%|| 731/731 [00:39<00:00, 18.69it/s]
Sampling 81 frames at 640x480 with 4 steps
100%|| 4/4 [02:46<00:00, 41.51s/it]
VAE decoding: 100%|| 2/2 [00:20<00:00, 10.25s/it]
*****Prompt executed in 246.59 seconds
got prompt
Initializing block swap: 100%|| 40/40 [00:00<00:00, 6499.02it/s]
Sampling 81 frames at 640x480 with 4 steps
100%|| 4/4 [02:46<00:00, 41.67s/it]
VAE decoding: 100%|| 2/2 [00:20<00:00, 10.21s/it]
*****Prompt executed in 188.62 seconds
got prompt
Initializing block swap: 100%|| 40/40 [00:00<00:00, 4924.34it/s]
Sampling 81 frames at 640x480 with 4 steps
100%|| 4/4 [02:57<00:00, 44.36s/it]
VAE decoding: 100%|| 2/2 [00:23<00:00, 11.65s/it]
*****Prompt executed in 202.30 seconds
i first found out about this when updating from cuda 12.6 to cuda 12.8 to test out sageattention 2++ but then i noticed it was slower, i reverted the sageattention version to the previous one and the speed was still the same (slower), then i reverted to cuda 12.6 (simply moved the /usr/local/cuda link to /usr/local/cuda.new and made a new link ln -s /usr/local/cuda12.6 /usr/local/cuda) if you still have an older version of cuda installed, it's worth checking it out. drivers also play a minor role but they're negligible (see picrel)
ps: sageattn2 right before the 2++ update, pytorch 2.7.1cu128 (even when testing with cuda 12.6)
dont believe me? quick search gets you:
https://github.com/pytorch/pytorch/issues/155607
https://www.reddit.com/r/LocalLLaMA/comments/1jlofc7/performance_regression_in_cuda_workloads_with/ (all 3000 series)
anon (3090) also reports big speedup after switching from cuda 12.8 to cuda 12.6 >>106121370
t. 3060 12gb + 64gb drr4 ram
might only apply to 3000 series but idk