Search Results

Found 1 results for "396febe67cc7fc986952584f8b2aeeab" across all boards searching md5.

Anonymous /g/106113484#106117701
8/2/2025, 6:35:05 PM
>>106117295
In my (limited) experience, its actually way harder to feed a model long sequences from the start and get it to converge, starting with short sequence and ramping it up does seem to be the way to go. that being said, I think they are way overcooking them at short sequence lengths. my current approach is to ramp up the context length quickly and then switch back to the short sequences for the main run and ramp up again finishing with the long sequences. I have no basis of comparison on the final result, so its not a real experiment it will either work or it wont.