>>106398904
No idea, but loss starting much below 1.0 tells me that the training data is mostly slop that the Llama models used as a base either find very familiar or very easy to digest.