Search Results

Found 1 results for "0264bce268300e110409694838bf954b" across all boards searching md5.

Anonymous /g/105832690#105842616
7/9/2025, 12:53:00 AM
>>105842418
I had claude write me the training script, its just reading the pretokenized chunks from an arrow file, its super stable, generating the arrow file was devastating to my ram and ssds but now that I have the dataset compiled the training script just feeds the chunks like clock work, It hasn't crashed since initial dialing in my model size when I had no expectations of vram use. its running an effective batch size of 64, at 8192 sequence length its eating over a half a million tokens every step.

I think 150gb might be on the really low end, its only like 40b tokens, most base models are trained on trillions of tokens. I'm just hoping my data is high enough quality and the domain is constrained enough.