Useful archiving efforts and other projects to help out with for people new to and interested in archiving:
HIGH priority (If you don't help archive these automatically, the data will probably be lost forever):
1. http://warrior.archiveteam.org/
Help out automatically archive things being shut down right now by running ArchiveTeam Warrior program (or specific containers) in the background:
Requirements: Few GB of space, some bandwidth and small amount of CPU power, more info: https://wiki.archiveteam.org/index.php/ArchiveTeam_Warrior
If you learn that a site or any online data is in danger of shutting down, read through this page and contact ArchiveTeam on their IRC if required in order to have it archived: https://wiki.archiveteam.org/index.php/Projects
2. Help out automatically forward URLs you browse that are not archived on https://archive.org to them for archival with a browser extension:
https://github.com/internetarchive/wayback-machine-webextension
MEDIUM priority (Important overall)
3. Seed torrents for as long as possible, rare data forever. Make sure to look up a guide for your router to PORT FORWARD your torrent client port, to substantially increase your upload (and your download) speed. In low population torrent swarms, if no one is port forwarded then you might not be able to connect to each other at all and exchange any data despite having it.
Requirements: As much or as little bandwitdh you want (you can set the limits if you need to)
https://github.com/qbittorrent/qBittorrent (Recommended client, especially to replace uTorrent)
4. Archive web pages you want to have a local copy of with a "Web Extension for saving a faithful copy of a complete web page in a single HTML file with a single click"
https://github.com/gildas-lormeau/SingleFile
5. Archive videos with "GUI front-end for youtube-dl, yt-dlp and other compatible video downloaders"
https://github.com/axcore/tartube
6. "Capture or record any area of your screen and share it with a single press of a key"
https://github.com/ShareX/ShareX
7. Archive entire websites you want to have a local copy of
https://www.httrack.com/
8. Publish the data that you have archived that isn't easily or at all available online. You can easily create torrents yourself in your torrent client and then share the magnet link to it anywhere online for anyone to access and, as long as DHT (Distributed Hash Table, decentralized way to share torrents without the need for any specific tracker) is enabled in settings (on by default), your files will be searchable on DHT by DHT crawlers, local or online (for example https://btdig.com/, where you can actually also search for FILE NAMES within all DHT torrents)
(archive.org also creates torrents for all uploads automatically but their torrents shouldn't be relied on because of an error-prone implementation and since they can also break when more files are uploaded or if the item's metadata changes, which includes even getting a new comment on the item)
OTHER useful things:
- In your torrent client settings add the best trackers to be automatically added for all of your newly added torrents (helps more easily connect to peers, especially in obscure torrents):
https://github.com/ngosang/trackerslist
- Look into running a node for I2P (anonymous private network within the global internet):
Requirements: Mostly bandwidth, more info: https://geti2p.net/en/faq
https://geti2p.net/
- Look into running Tor/Hyphanet(Freenet)/IPFS/YaCy/SearXNG nodes.
- "A self-hosted BitTorrent indexer, DHT crawler, content classifier and torrent search engine with web UI"
https://github.com/bitmagnet-io/bitmagnet
- "ArchiveBox is a powerful, self-hosted internet archiving solution to collect, save, and view websites offline"
https://github.com/ArchiveBox/ArchiveBox
- Look into donating your PC resources to be used more intensively in projects:
BOINC (Berkeley Open Infrastructure for Network Computing: https://boinc.berkeley.edu/projects.php
GIMPS (Great Internet Mersenne Prime Search): https://www.mersenne.org/
- Additional archiving tools: https://github.com/iipc/awesome-web-archiving
- Additional links to archiving and similar communities:
https://wiki.archiveteam.org/index.php/Archiveteam:IRC
https://www.reddit.com/r/Archiveteam
https://www.reddit.com/r/DataHoarder
https://www.reddit.com/r/DataHoarder/wiki/index/ - Hardware and software for data hoarding FAQ
https://www.reddit.com/r/lostmedia
https://www.reddit.com/r/GamePreservationists
https://www.reddit.com/r/torrents
https://www.reddit.com/r/qBittorrent
https://annas-archive.se/torrents
>>>/t/
What are you archiving or want to archive?
Do you have or know anyone who has some rare interesting data or media not available online?
>>105980167 (OP)bump for awareness despite knowing no one ever replies to these threads but somehow OP still bothers to make them
how do i archive a site locally that has some modern js in it? httrack doesn't work properly
>>105983366Depends on the amount and type of JS on it. You might need to use Puppeteer or Selenium, but start by trying https://github.com/ArchiveBox/ArchiveBox
>>105980337>>105981899Don't bump it. Nobody cares about being a node for a wider archival system for things they personally don't care about.
A thread talking about how to archive and backup what each of us cares about would be far more interesting.
>>105983872>Don't bump it. Nobody cares about being a node for a wider archival system for things they personally don't care about.Thousands already care.
>A thread talking about how to archive and backup what each of us cares about would be far more interesting.There aren't enough people on this board to have a discussion about each specialized archival topic.
>>105983899>specialized archival topicThat's what this thread is, though.
Make a broader archival/backup general if you want a thread with broader appeal. Even a thread specifically about personal backups would be more popular.
>>105983919How much broader can it get?
>personal backupsThat's just user preference of what hardware and software to go along with it they can afford and need and is different for every user, and already covered at https://www.reddit.com/r/DataHoarder/wiki/index/
>>105983940This thread is highly specific.
>>105983426i also had issues with cloudflare with anti scraping not sure if delaying helps
is there a good alt version of httrack? it's old
how much internet do anons have downloaded?
>>105983940>>105983899You've made this thread over 250 times.Those who care, already do this useless crap. You're not going to evangelize further.
>>105983366custom python script, look what API endpoints are called in JS and call them with requests in python.
Copy the cookie from your browser if logging in is needed.