>>106503275
>Have you tried 768? 1024 kills my machine
Yes. 640 is decent too. I can go up to 1280 without block swapping on batch size 1 with float8 (24GB VRAM). Tested float8 with validation and it's within a 0.0001-0.0005 difference. The loss is so little that I just leave it on for flexibility