>>107114318
Yes learning is much much more stable. Which really goes to show people could be training models from scratch all along, this is just a single 5090 with 1.5 million images. Imagine what a 8x GPU cluster could do, could do like a 25 million dataset on a 4-8B model in a couple of months.