Search - 4rchive

I am trying to understand how llms work and one thing that confuses me is how the weights are tuned without old knowledge getting lost. If you train the model to predict biology tokens and then you train the model on legal documents, shouldn't the biological knowledge that was encoded in the weights be completely overwritten? Are they changing the weights in non-repeating chunks? Is the training data just one continuous string so all of it is always relevant?

Search results for "ca27dab2b6a8f3b79946a68b91ab4717" in md5 (1)