Search - 4rchive

>>106531399

1) I couldn't care less whether or not (YOU) use it, I'm just sharing it
2) Use a trainer that supports streaming.

https://docs.axolotl.ai/docs/streaming.html

You don't HAVE to load the entire thing into VRAM. Even the companies that have rooms upon rooms of GPUs don't load the entire data set into the ramp because that's an idiotic and inefficient way to do it. You little pieces of it in, train on those, then offload and reload the next piece of the data set in and train on that. Do that until you've looked over the entire data set at which point you've completed one epoch and then you do that again for the remaining epochs/steps.

You also act like training on 2 gigs of data alone is actually a lot. Sure it'll take way longer than training on something smaller like a few MB but I don't know why you think It can only be done with data center grade hardware or a giant cluster or some shit.