Search Results

Found 1 results for "c791b0fa4ed00bb4dd62b96623cf04be" across all boards searching md5.

7/9/2025, 9:08:37 AM

>>105845652
From experience and observation, an 8B model trained on 2500 samples, 4k tokens, 4 epochs would have been more than acceptable a couple years ago, but I don't think most people are going to settle for less than a 24B model nowadays (x3 compute) and the model trained with at least 16k context (x4 compute). So we're looking for at least 12x more compute for a mostly basic RP finetune, putting aside the time required for tests and ablations.

Go to Thread

Page 1