Search Results

Found 1 results for "ae6c5bfb281b7dc097531d966bb8f216" across all boards searching md5.

Anonymous /g/105800515#105801625
7/4/2025, 9:16:56 PM
>>105801590
You don't know how bad the situation really is. They're basically throwing shit at the models during pretraining, while taking high-effort documents away just because they contain "bad words". Picrel is an example document from FineWeb (supposedly a high-quality pretraining dataset). Yes, that's the entire document.