===UPLIFTING NEWS===
570.133.07 with cuda 12.8 is magically no longer fucked up, maybe old pytorch cu128 had fucked up kernels (source: cudadev)
in fact it's faster now!