>>105686400 (OP)Anything and everything. Websites that own large repositories of data and forums are going to start shutting down access to them, you won't be able to scrape them anymore because AI companies are scraping the web to train their LLMs. Website owners and companies are going to start realizing the value of data and will gate keep it.
ANY content prior to the advent and ubiquitous spread of ChatGPT (like 2022 or something) is invaluable, as it is irrefutable proof (for the most part) that the content was generated by actual humans, and companies don't want to train their LLMs on other LLMs.
Additionally, if you want to resist the demonic AI hellhole that is our future, you'll also want access to content that you're sure was created by humans. Forums, book archives, Wikis, etc.. AI is going to taint EVERYTHING. Not just content on the web, EVERYTHING.
The future internet is AI generated slop, the idea of "Searching the web" will be antiquated and inaccessible for most people. Archive anything you can, while you can.